Marks: 60
The stock market has consistently proven to be a good place to invest in and save for the future. There are a lot of compelling reasons to invest in stocks. It can help in fighting inflation, create wealth, and also provides some tax benefits. Good steady returns on investments over a long period of time can also grow a lot more than seems possible. Also, thanks to the power of compound interest, the earlier one starts investing, the larger the corpus one can have for retirement. Overall, investing in stocks can help meet life's financial aspirations.
It is important to maintain a diversified portfolio when investing in stocks in order to maximise earnings under any market condition. Having a diversified portfolio tends to yield higher returns and face lower risk by tempering potential losses when the market is down. It is often easy to get lost in a sea of financial metrics to analyze while determining the worth of a stock, and doing the same for a multitude of stocks to identify the right picks for an individual can be a tedious task. By doing a cluster analysis, one can identify stocks that exhibit similar characteristics and ones which exhibit minimum correlation. This will help investors better analyze stocks across different market segments and help protect against risks that could make the portfolio vulnerable to losses.
Trade&Ahead is a financial consultancy firm who provide their customers with personalized investment strategies. They have hired you as a Data Scientist and provided you with data comprising stock price and some financial indicators for a few companies listed under the New York Stock Exchange. They have assigned you the tasks of analyzing the data, grouping the stocks based on the attributes provided, and sharing insights about the characteristics of each group.
import numpy as np
import pandas as pd
#for visualizations
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
#for missing value imputation
from sklearn.impute import SimpleImputer
#for scaling the data using z-score
from sklearn.preprocessing import StandardScaler
from scipy.spatial.distance import cdist, pdist
from scipy.stats import zscore
#for k-means clustering
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from yellowbrick.cluster import KElbowVisualizer, SilhouetteVisualizer
#for hierarchical clustering
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage, cophenet
import warnings
warnings.filterwarnings('ignore')
#link google drive
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
data = pd.read_csv('/content/drive/MyDrive/Project7/stock_data.csv')
data1 = data.copy()
df = data.copy()
df.head()
| Ticker Symbol | Security | GICS Sector | GICS Sub Industry | Current Price | Price Change | Volatility | ROE | Cash Ratio | Net Cash Flow | Net Income | Earnings Per Share | Estimated Shares Outstanding | P/E Ratio | P/B Ratio | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | AAL | American Airlines Group | Industrials | Airlines | 42.349998 | 9.999995 | 1.687151 | 135 | 51 | -604000000 | 7610000000 | 11.39 | 6.681299e+08 | 3.718174 | -8.784219 |
| 1 | ABBV | AbbVie | Health Care | Pharmaceuticals | 59.240002 | 8.339433 | 2.197887 | 130 | 77 | 51000000 | 5144000000 | 3.15 | 1.633016e+09 | 18.806350 | -8.750068 |
| 2 | ABT | Abbott Laboratories | Health Care | Health Care Equipment | 44.910000 | 11.301121 | 1.273646 | 21 | 67 | 938000000 | 4423000000 | 2.94 | 1.504422e+09 | 15.275510 | -0.394171 |
| 3 | ADBE | Adobe Systems Inc | Information Technology | Application Software | 93.940002 | 13.977195 | 1.357679 | 9 | 180 | -240840000 | 629551000 | 1.26 | 4.996437e+08 | 74.555557 | 4.199651 |
| 4 | ADI | Analog Devices, Inc. | Information Technology | Semiconductors | 55.320000 | -1.827858 | 1.701169 | 14 | 272 | 315120000 | 696878000 | 0.31 | 2.247994e+09 | 178.451613 | 1.059810 |
df.shape
(340, 15)
340 rows and 15 columns in the stock data.
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 340 entries, 0 to 339 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Ticker Symbol 340 non-null object 1 Security 340 non-null object 2 GICS Sector 340 non-null object 3 GICS Sub Industry 340 non-null object 4 Current Price 340 non-null float64 5 Price Change 340 non-null float64 6 Volatility 340 non-null float64 7 ROE 340 non-null int64 8 Cash Ratio 340 non-null int64 9 Net Cash Flow 340 non-null int64 10 Net Income 340 non-null int64 11 Earnings Per Share 340 non-null float64 12 Estimated Shares Outstanding 340 non-null float64 13 P/E Ratio 340 non-null float64 14 P/B Ratio 340 non-null float64 dtypes: float64(7), int64(4), object(4) memory usage: 40.0+ KB
There are 4 columns with type object (Ticker symbol, Security, GICS Sector and GICS Sub industry. There are 11 numeric columns with 7 as float and 4 as int.
df.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| Current Price | 340.0 | 8.086234e+01 | 9.805509e+01 | 4.500000e+00 | 3.855500e+01 | 5.970500e+01 | 9.288000e+01 | 1.274950e+03 |
| Price Change | 340.0 | 4.078194e+00 | 1.200634e+01 | -4.712969e+01 | -9.394838e-01 | 4.819505e+00 | 1.069549e+01 | 5.505168e+01 |
| Volatility | 340.0 | 1.525976e+00 | 5.917984e-01 | 7.331632e-01 | 1.134878e+00 | 1.385593e+00 | 1.695549e+00 | 4.580042e+00 |
| ROE | 340.0 | 3.959706e+01 | 9.654754e+01 | 1.000000e+00 | 9.750000e+00 | 1.500000e+01 | 2.700000e+01 | 9.170000e+02 |
| Cash Ratio | 340.0 | 7.002353e+01 | 9.042133e+01 | 0.000000e+00 | 1.800000e+01 | 4.700000e+01 | 9.900000e+01 | 9.580000e+02 |
| Net Cash Flow | 340.0 | 5.553762e+07 | 1.946365e+09 | -1.120800e+10 | -1.939065e+08 | 2.098000e+06 | 1.698108e+08 | 2.076400e+10 |
| Net Income | 340.0 | 1.494385e+09 | 3.940150e+09 | -2.352800e+10 | 3.523012e+08 | 7.073360e+08 | 1.899000e+09 | 2.444200e+10 |
| Earnings Per Share | 340.0 | 2.776662e+00 | 6.587779e+00 | -6.120000e+01 | 1.557500e+00 | 2.895000e+00 | 4.620000e+00 | 5.009000e+01 |
| Estimated Shares Outstanding | 340.0 | 5.770283e+08 | 8.458496e+08 | 2.767216e+07 | 1.588482e+08 | 3.096751e+08 | 5.731175e+08 | 6.159292e+09 |
| P/E Ratio | 340.0 | 3.261256e+01 | 4.434873e+01 | 2.935451e+00 | 1.504465e+01 | 2.081988e+01 | 3.176476e+01 | 5.280391e+02 |
| P/B Ratio | 340.0 | -1.718249e+00 | 1.396691e+01 | -7.611908e+01 | -4.352056e+00 | -1.067170e+00 | 3.917066e+00 | 1.290646e+02 |
df.isnull().sum()
Ticker Symbol 0 Security 0 GICS Sector 0 GICS Sub Industry 0 Current Price 0 Price Change 0 Volatility 0 ROE 0 Cash Ratio 0 Net Cash Flow 0 Net Income 0 Earnings Per Share 0 Estimated Shares Outstanding 0 P/E Ratio 0 P/B Ratio 0 dtype: int64
There are no missing values.
df.duplicated().sum()
0
There are no duplicates in the data set.
cols = df.columns
for col in cols:
print("Unique values in {}" .format(col), df[col].unique() )
print("---"*100)
Unique values in Ticker Symbol ['AAL' 'ABBV' 'ABT' 'ADBE' 'ADI' 'ADM' 'ADS' 'AEE' 'AEP' 'AFL' 'AIG' 'AIV'
'AIZ' 'AJG' 'AKAM' 'ALB' 'ALK' 'ALL' 'ALLE' 'ALXN' 'AMAT' 'AME' 'AMG'
'AMGN' 'AMP' 'AMT' 'AMZN' 'AN' 'ANTM' 'AON' 'APA' 'APC' 'APH' 'ARNC'
'ATVI' 'AVB' 'AVGO' 'AWK' 'AXP' 'BA' 'BAC' 'BAX' 'BBT' 'BCR' 'BHI' 'BIIB'
'BK' 'BLL' 'BMY' 'BSX' 'BWA' 'BXP' 'C' 'CAT' 'CB' 'CBG' 'CCI' 'CCL'
'CELG' 'CF' 'CFG' 'CHD' 'CHK' 'CHRW' 'CHTR' 'CI' 'CINF' 'CL' 'CMA' 'CME'
'CMG' 'CMI' 'CMS' 'CNC' 'CNP' 'COF' 'COG' 'COO' 'CSX' 'CTL' 'CTSH' 'CTXS'
'CVS' 'CVX' 'CXO' 'D' 'DAL' 'DD' 'DE' 'DFS' 'DGX' 'DHR' 'DIS' 'DISCA'
'DISCK' 'DLPH' 'DLR' 'DNB' 'DOV' 'DPS' 'DUK' 'DVA' 'DVN' 'EBAY' 'ECL'
'ED' 'EFX' 'EIX' 'EMN' 'EOG' 'EQIX' 'EQR' 'EQT' 'ES' 'ESS' 'ETFC' 'ETN'
'ETR' 'EW' 'EXC' 'EXPD' 'EXPE' 'EXR' 'F' 'FAST' 'FB' 'FBHS' 'FCX' 'FE'
'FIS' 'FISV' 'FLIR' 'FLR' 'FLS' 'FMC' 'FRT' 'FSLR' 'FTR' 'GD' 'GGP'
'GILD' 'GLW' 'GM' 'GPC' 'GRMN' 'GT' 'GWW' 'HAL' 'HAS' 'HBAN' 'HCA' 'HCN'
'HCP' 'HES' 'HIG' 'HOG' 'HON' 'HPE' 'HPQ' 'HRL' 'HSIC' 'HST' 'HSY' 'HUM'
'IBM' 'IDXX' 'IFF' 'INTC' 'IP' 'IPG' 'IRM' 'ISRG' 'ITW' 'IVZ' 'JBHT'
'JEC' 'JNPR' 'JPM' 'KIM' 'KMB' 'KMI' 'KO' 'KSU' 'LEG' 'LEN' 'LH' 'LKQ'
'LLL' 'LLY' 'LMT' 'LNT' 'LUK' 'LUV' 'LVLT' 'LYB' 'MA' 'MAA' 'MAC' 'MAR'
'MAS' 'MAT' 'MCD' 'MCO' 'MDLZ' 'MET' 'MHK' 'MJN' 'MKC' 'MLM' 'MMC' 'MMM'
'MNST' 'MO' 'MOS' 'MPC' 'MRK' 'MRO' 'MTB' 'MTD' 'MUR' 'MYL' 'NAVI' 'NBL'
'NDAQ' 'NEE' 'NEM' 'NFLX' 'NFX' 'NLSN' 'NOV' 'NSC' 'NTRS' 'NUE' 'NWL' 'O'
'OKE' 'OMC' 'ORLY' 'OXY' 'PBCT' 'PBI' 'PCAR' 'PCG' 'PCLN' 'PEG' 'PEP'
'PFE' 'PFG' 'PG' 'PGR' 'PHM' 'PM' 'PNC' 'PNR' 'PNW' 'PPG' 'PPL' 'PRU'
'PSX' 'PWR' 'PX' 'PYPL' 'R' 'RCL' 'REGN' 'RHI' 'ROP' 'RRC' 'RSG' 'SCG'
'SCHW' 'SE' 'SEE' 'SHW' 'SLG' 'SNI' 'SO' 'SPG' 'SPGI' 'SRCL' 'SRE' 'STI'
'STT' 'SWKS' 'SWN' 'SYF' 'SYK' 'T' 'TAP' 'TDC' 'TGNA' 'TMK' 'TMO' 'TRIP'
'TRV' 'TSCO' 'TSN' 'TSO' 'TSS' 'TXN' 'UAA' 'UAL' 'UDR' 'UHS' 'UNH' 'UNM'
'UNP' 'UPS' 'UTX' 'VAR' 'VLO' 'VMC' 'VNO' 'VRSK' 'VRSN' 'VRTX' 'VTR' 'VZ'
'WAT' 'WEC' 'WFC' 'WHR' 'WM' 'WMB' 'WU' 'WY' 'WYN' 'WYNN' 'XEC' 'XEL'
'XL' 'XOM' 'XRAY' 'XRX' 'XYL' 'YHOO' 'YUM' 'ZBH' 'ZION' 'ZTS']
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in Security ['American Airlines Group' 'AbbVie' 'Abbott Laboratories'
'Adobe Systems Inc' 'Analog Devices, Inc.' 'Archer-Daniels-Midland Co'
'Alliance Data Systems' 'Ameren Corp' 'American Electric Power'
'AFLAC Inc' 'American International Group, Inc.'
'Apartment Investment & Mgmt' 'Assurant Inc' 'Arthur J. Gallagher & Co.'
'Akamai Technologies Inc' 'Albemarle Corp' 'Alaska Air Group Inc'
'Allstate Corp' 'Allegion' 'Alexion Pharmaceuticals'
'Applied Materials Inc' 'AMETEK Inc' 'Affiliated Managers Group Inc'
'Amgen Inc' 'Ameriprise Financial' 'American Tower Corp A'
'Amazon.com Inc' 'AutoNation Inc' 'Anthem Inc.' 'Aon plc'
'Apache Corporation' 'Anadarko Petroleum Corp' 'Amphenol Corp'
'Arconic Inc' 'Activision Blizzard' 'AvalonBay Communities, Inc.'
'Broadcom' 'American Water Works Company Inc' 'American Express Co'
'Boeing Company' 'Bank of America Corp' 'Baxter International Inc.'
'BB&T Corporation' 'Bard (C.R.) Inc.' 'Baker Hughes Inc'
'BIOGEN IDEC Inc.' 'The Bank of New York Mellon Corp.' 'Ball Corp'
'Bristol-Myers Squibb' 'Boston Scientific' 'BorgWarner'
'Boston Properties' 'Citigroup Inc.' 'Caterpillar Inc.' 'Chubb Limited'
'CBRE Group' 'Crown Castle International Corp.' 'Carnival Corp.'
'Celgene Corp.' 'CF Industries Holdings Inc' 'Citizens Financial Group'
'Church & Dwight' 'Chesapeake Energy' 'C. H. Robinson Worldwide'
'Charter Communications' 'CIGNA Corp.' 'Cincinnati Financial'
'Colgate-Palmolive' 'Comerica Inc.' 'CME Group Inc.'
'Chipotle Mexican Grill' 'Cummins Inc.' 'CMS Energy'
'Centene Corporation' 'CenterPoint Energy' 'Capital One Financial'
'Cabot Oil & Gas' 'The Cooper Companies' 'CSX Corp.' 'CenturyLink Inc'
'Cognizant Technology Solutions' 'Citrix Systems' 'CVS Health'
'Chevron Corp.' 'Concho Resources' 'Dominion Resources' 'Delta Air Lines'
'Du Pont (E.I.)' 'Deere & Co.' 'Discover Financial Services'
'Quest Diagnostics' 'Danaher Corp.' 'The Walt Disney Company'
'Discovery Communications-A' 'Discovery Communications-C'
'Delphi Automotive' 'Digital Realty Trust' 'Dun & Bradstreet'
'Dover Corp.' 'Dr Pepper Snapple Group' 'Duke Energy' 'DaVita Inc.'
'Devon Energy Corp.' 'eBay Inc.' 'Ecolab Inc.' 'Consolidated Edison'
'Equifax Inc.' "Edison Int'l" 'Eastman Chemical' 'EOG Resources'
'Equinix' 'Equity Residential' 'EQT Corporation' 'Eversource Energy'
'Essex Property Trust, Inc.' 'E*Trade' 'Eaton Corporation'
'Entergy Corp.' 'Edwards Lifesciences' 'Exelon Corp.' "Expeditors Int'l"
'Expedia Inc.' 'Extra Space Storage' 'Ford Motor' 'Fastenal Co'
'Facebook' 'Fortune Brands Home & Security' 'Freeport-McMoran Cp & Gld'
'FirstEnergy Corp' 'Fidelity National Information Services' 'Fiserv Inc'
'FLIR Systems' 'Fluor Corp.' 'Flowserve Corporation' 'FMC Corporation'
'Federal Realty Investment Trust' 'First Solar Inc'
'Frontier Communications' 'General Dynamics'
'General Growth Properties Inc.' 'Gilead Sciences' 'Corning Inc.'
'General Motors' 'Genuine Parts' 'Garmin Ltd.' 'Goodyear Tire & Rubber'
'Grainger (W.W.) Inc.' 'Halliburton Co.' 'Hasbro Inc.'
'Huntington Bancshares' 'HCA Holdings' 'Welltower Inc.' 'HCP Inc.'
'Hess Corporation' 'Hartford Financial Svc.Gp.' 'Harley-Davidson'
"Honeywell Int'l Inc." 'Hewlett Packard Enterprise' 'HP Inc.'
'Hormel Foods Corp.' 'Henry Schein' 'Host Hotels & Resorts'
'The Hershey Company' 'Humana Inc.' 'International Business Machines'
'IDEXX Laboratories' 'Intl Flavors & Fragrances' 'Intel Corp.'
'International Paper' 'Interpublic Group' 'Iron Mountain Incorporated'
'Intuitive Surgical Inc.' 'Illinois Tool Works' 'Invesco Ltd.'
'J. B. Hunt Transport Services' 'Jacobs Engineering Group'
'Juniper Networks' 'JPMorgan Chase & Co.' 'Kimco Realty' 'Kimberly-Clark'
'Kinder Morgan' 'Coca Cola Company' 'Kansas City Southern'
'Leggett & Platt' 'Lennar Corp.' 'Laboratory Corp. of America Holding'
'LKQ Corporation' 'L-3 Communications Holdings' 'Lilly (Eli) & Co.'
'Lockheed Martin Corp.' 'Alliant Energy Corp' 'Leucadia National Corp.'
'Southwest Airlines' 'Level 3 Communications' 'LyondellBasell'
'Mastercard Inc.' 'Mid-America Apartments' 'Macerich' "Marriott Int'l."
'Masco Corp.' 'Mattel Inc.' "McDonald's Corp." "Moody's Corp"
'Mondelez International' 'MetLife Inc.' 'Mohawk Industries'
'Mead Johnson' 'McCormick & Co.' 'Martin Marietta Materials'
'Marsh & McLennan' '3M Company' 'Monster Beverage' 'Altria Group Inc'
'The Mosaic Company' 'Marathon Petroleum' 'Merck & Co.'
'Marathon Oil Corp.' 'M&T Bank Corp.' 'Mettler Toledo' 'Murphy Oil'
'Mylan N.V.' 'Navient' 'Noble Energy Inc' 'NASDAQ OMX Group'
'NextEra Energy' 'Newmont Mining Corp. (Hldg. Co.)' 'Netflix Inc.'
'Newfield Exploration Co' 'Nielsen Holdings'
'National Oilwell Varco Inc.' 'Norfolk Southern Corp.'
'Northern Trust Corp.' 'Nucor Corp.' 'Newell Brands'
'Realty Income Corporation' 'ONEOK' 'Omnicom Group' "O'Reilly Automotive"
'Occidental Petroleum' "People's United Financial" 'Pitney-Bowes'
'PACCAR Inc.' 'PG&E Corp.' 'Priceline.com Inc'
'Public Serv. Enterprise Inc.' 'PepsiCo Inc.' 'Pfizer Inc.'
'Principal Financial Group' 'Procter & Gamble' 'Progressive Corp.'
'Pulte Homes Inc.' 'Philip Morris International' 'PNC Financial Services'
'Pentair Ltd.' 'Pinnacle West Capital' 'PPG Industries' 'PPL Corp.'
'Prudential Financial' 'Phillips 66' 'Quanta Services Inc.'
'Praxair Inc.' 'PayPal' 'Ryder System' 'Royal Caribbean Cruises Ltd'
'Regeneron' 'Robert Half International' 'Roper Industries'
'Range Resources Corp.' 'Republic Services Inc' 'SCANA Corp'
'Charles Schwab Corporation' 'Spectra Energy Corp.' 'Sealed Air'
'Sherwin-Williams' 'SL Green Realty' 'Scripps Networks Interactive Inc.'
'Southern Co.' 'Simon Property Group Inc' 'S&P Global, Inc.'
'Stericycle Inc' 'Sempra Energy' 'SunTrust Banks' 'State Street Corp.'
'Skyworks Solutions' 'Southwestern Energy' 'Synchrony Financial'
'Stryker Corp.' 'AT&T Inc' 'Molson Coors Brewing Company'
'Teradata Corp.' 'Tegna, Inc.' 'Torchmark Corp.'
'Thermo Fisher Scientific' 'TripAdvisor' 'The Travelers Companies Inc.'
'Tractor Supply Company' 'Tyson Foods' 'Tesoro Petroleum Co.'
'Total System Services' 'Texas Instruments' 'Under Armour'
'United Continental Holdings' 'UDR Inc' 'Universal Health Services, Inc.'
'United Health Group Inc.' 'Unum Group' 'Union Pacific'
'United Parcel Service' 'United Technologies' 'Varian Medical Systems'
'Valero Energy' 'Vulcan Materials' 'Vornado Realty Trust'
'Verisk Analytics' 'Verisign Inc.' 'Vertex Pharmaceuticals Inc'
'Ventas Inc' 'Verizon Communications' 'Waters Corporation'
'Wec Energy Group Inc' 'Wells Fargo' 'Whirlpool Corp.'
'Waste Management Inc.' 'Williams Cos.' 'Western Union Co'
'Weyerhaeuser Corp.' 'Wyndham Worldwide' 'Wynn Resorts Ltd'
'Cimarex Energy' 'Xcel Energy Inc' 'XL Capital' 'Exxon Mobil Corp.'
'Dentsply Sirona' 'Xerox Corp.' 'Xylem Inc.' 'Yahoo Inc.'
'Yum! Brands Inc' 'Zimmer Biomet Holdings' 'Zions Bancorp' 'Zoetis']
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in GICS Sector ['Industrials' 'Health Care' 'Information Technology' 'Consumer Staples'
'Utilities' 'Financials' 'Real Estate' 'Materials'
'Consumer Discretionary' 'Energy' 'Telecommunications Services']
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in GICS Sub Industry ['Airlines' 'Pharmaceuticals' 'Health Care Equipment'
'Application Software' 'Semiconductors' 'Agricultural Products'
'Data Processing & Outsourced Services' 'MultiUtilities'
'Electric Utilities' 'Life & Health Insurance'
'Property & Casualty Insurance' 'REITs' 'Multi-line Insurance'
'Insurance Brokers' 'Internet Software & Services' 'Specialty Chemicals'
'Building Products' 'Biotechnology' 'Semiconductor Equipment'
'Electrical Components & Equipment' 'Asset Management & Custody Banks'
'Specialized REITs' 'Internet & Direct Marketing Retail'
'Specialty Stores' 'Managed Health Care'
'Oil & Gas Exploration & Production' 'Electronic Components'
'Aerospace & Defense' 'Home Entertainment Software' 'Residential REITs'
'Water Utilities' 'Consumer Finance' 'Banks'
'Oil & Gas Equipment & Services' 'Metal & Glass Containers'
'Health Care Distributors' 'Auto Parts & Equipment'
'Construction & Farm Machinery & Heavy Trucks' 'Real Estate Services'
'Hotels, Resorts & Cruise Lines' 'Fertilizers & Agricultural Chemicals'
'Regional Banks' 'Household Products' 'Integrated Oil & Gas'
'Air Freight & Logistics' 'Cable & Satellite'
'Financial Exchanges & Data' 'Restaurants' 'Industrial Machinery'
'Health Care Supplies' 'Railroads'
'Integrated Telecommunications Services' 'IT Consulting & Other Services'
'Drug Retail' 'Diversified Chemicals' 'Health Care Facilities'
'Industrial Conglomerates' 'Broadcasting & Cable TV'
'Research & Consulting Services' 'Soft Drinks'
'Investment Banking & Brokerage' 'Automobile Manufacturers' 'Copper'
'Electronic Equipment & Instruments' 'Diversified Commercial Services'
'Retail REITs' 'Consumer Electronics' 'Tires & Rubber'
'Industrial Materials' 'Leisure Products' 'Motorcycle Manufacturers'
'Technology Hardware, Storage & Peripherals' 'Computer Hardware'
'Packaged Foods & Meats' 'Paper Packaging' 'Advertising' 'Trucking'
'Networking Equipment' 'Oil & Gas Refining & Marketing & Transportation'
'Homebuilding' 'Distributors' 'Multi-Sector Holdings'
'Alternative Carriers' 'Diversified Financial Services'
'Home Furnishings' 'Construction Materials' 'Tobacco'
'Life Sciences Tools & Services' 'Gold' 'Steel'
'Housewares & Specialties' 'Thrifts & Mortgage Finance'
'Technology, Hardware, Software and Supplies' 'Personal Products'
'Industrial Gases' 'Human Resource & Employment Services' 'Office REITs'
'Brewers' 'Publishing' 'Specialty Retail'
'Apparel, Accessories & Luxury Goods' 'Household Appliances'
'Environmental Services' 'Casinos & Gaming']
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in Current Price [ 42.349998 59.240002 44.91 93.940002 55.32
36.68 276.570007 43.23 58.27 59.900002
61.970001 40.029999 80.540001 40.939999 52.630001
56.009998 80.510002 62.09 65.919998 190.75
18.67 53.59 159.759995 162.330002 106.419998
96.949997 675.890015 59.66 139.440002 92.209999
44.470001 48.580002 52.23 7.3988066 38.709999
184.130005 145.149994 59.75 69.550003 144.589996
16.83 38.150002 37.810001 189.440002 46.150002
306.350006 41.220001 72.730003 68.790001 18.440001
127.540001 51.75 67.959999 116.849998 34.580002
86.449997 54.48 119.760002 40.810001 26.190001
42.4399985 4.5 62.02 183.100006 146.330002
59.169998 66.620003 41.830002 90.599998 479.850006
88.010002 36.080002 65.809998 18.360001 72.18
17.690001 134.199997 25.950001 25.16 60.02
75.650002 97.769997 89.959999 92.860001 67.639999
50.689999 66.599998 76.269997 53.619999 71.139999
70.41698484 105.080002 26.68 25.219999 85.730003
75.620003 103.93 61.310001 93.199997 71.389999
69.709999 32. 27.48 114.379997 64.269997
111.370003 59.209999 67.510002 70.790001 302.399994
81.589996 52.130001 51.07 239.410004 29.639999
52.040001 68.360001 78.980003 27.77 45.099998
124.300003 88.209999 14.09 40.82 104.660004
55.5 6.77 31.73 60.599998 91.459999
28.07 47.220001 42.080002 39.130001 146.100006
65.989998 4.67 137.360001 27.209999 101.190002
18.280001 34.009998 85.889999 37.169998 32.669998
202.589996 34.040001 67.360001 11.06 67.629997
68.029999 34.82695902 48.48 43.459999 45.389999
103.57 15.2 11.84 39.540001 158.190002
15.34 89.269997 178.509995 137.619995 72.919998
119.639999 34.450001 37.700001 23.280001 27.01
546.159973 92.68 33.48 73.360001 41.950001
27.6 66.029999 26.459999 127.300003 14.92
42.959999 74.669998 42.02 48.91 123.639999
29.629999 119.510002 84.260002 217.149994 31.2250005
17.389999 43.060001 54.360001 86.900002 97.360001
90.809998 80.690002 67.040001 28.299999 27.17
118.139999 100.339996 44.84 48.209999 189.389999
78.949997 85.559998 136.580002 55.450001 150.639999
49.65333167 58.209999 27.59 51.84 52.82
12.59 121.18 339.130005 22.450001 54.07
11.45 32.93 58.169998 103.889999 17.99
32.560001 46.599998 33.490002 84.589996 72.089996
40.299999 44.080002 51.630001 24.66 75.660004
253.419998 67.610001 16.15 20.65 47.400002
53.189999 1274.949951 38.689999 99.919998 32.279999
44.98 79.410004 31.799999 17.82 87.910004
95.309998 49.529999 64.480003 98.82 34.130001
81.410004 81.800003 20.25 102.400002 36.200001
56.830002 101.209999 542.869995 47.139999 189.789993
24.610001 43.990002 60.490002 23.940001 44.599998
259.600006 112.980003 55.209999 46.790001 194.440002
98.580002 120.599998 94.010002 42.84 66.360001
76.830002 7.11 30.41 92.940002 34.41
93.919998 26.42 25.52 57.16 141.850006
85.25 112.860001 85.5 53.330002 105.370003
49.799999 54.810001 80.610001 57.299999 37.57
119.489998 117.639999 33.290001 78.199997 96.230003
96.07 80.800003 70.709999 94.970001 99.959999
76.879997 87.360001 125.830002 56.43 46.220001
134.580002 51.310001 146.869995 53.369999 25.700001
17.91 29.98 72.650002 69.190002 89.379997
35.91 39.18 77.949997 60.849998 10.63
36.5 33.259998 52.51617541 102.589996 27.299999
47.919998 ]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in Price Change [ 9.99999481e+00 8.33943306e+00 1.13011208e+01 1.39771952e+01
-1.82785810e+00 -1.20172682e+01 6.18928557e+00 2.17442444e+00
2.37175342e+00 3.02718100e+00 8.35810821e+00 7.57860810e+00
1.89777325e+00 -6.06943448e-01 -2.37909028e+01 2.64619479e+01
2.06643644e+00 6.59227468e+00 1.37532301e+01 2.23383802e+01
2.68342391e+01 2.21247377e+00 -6.61133544e+00 1.71634778e+01
-2.42068591e+00 1.02330873e+01 3.22681047e+01 2.35031562e+00
-6.20052749e-01 3.91030097e+00 1.13978037e+01 -2.08020835e+01
2.69366688e+00 1.64778443e+00 2.33195293e+01 4.85763024e+00
1.79026828e+01 8.59687386e+00 -6.21629012e+00 1.01050779e+01
8.44072165e+00 1.67023651e+01 5.94004500e+00 1.54918196e+00
-1.23123672e+01 4.91798229e+00 5.42200283e+00 1.65358164e+01
1.60816796e+01 1.17575818e+01 3.47056255e+00 7.20349662e+00
4.71469465e+00 3.55020891e+00 1.31938338e+01 8.19775683e+00
9.56906820e+00 8.93821671e+00 8.44879290e+00 -9.25061131e+00
1.02736884e+01 1.04761548e+00 -3.81017882e+01 -9.00822130e+00
3.59850674e+00 8.68241465e+00 9.77735771e+00 4.78137921e+00
1.90012916e+00 -2.40224491e+00 -3.31312678e+01 -1.88847908e+01
1.94971184e+00 2.17125911e+01 1.43646961e+00 -6.19574582e-01
-2.00993595e+01 -9.67221466e+00 -4.34942146e+00 1.59231682e-01
-4.65448920e+00 9.02147729e+00 1.32656133e+00 1.28449547e+01
-2.74403217e+00 -3.98864176e+00 1.33750842e+01 3.74896767e+01
3.95256083e+00 3.65358399e+00 1.56747951e+01 8.92459471e+00
2.04914148e+00 2.02676864e+00 3.57289117e+00 1.21093264e+01
1.55739004e+01 -1.18843887e+00 6.97958459e+00 1.80493990e+01
-8.33447724e-01 -3.62229079e+00 -1.54780794e+01 1.21632653e+01
3.78368391e+00 -3.97430599e+00 1.45310626e+01 -6.13507114e+00
3.65423785e+00 -4.07859333e+00 1.00196502e+01 8.03760493e+00
-2.12537714e+01 7.09921134e-01 6.76507254e+00 1.26567850e+01
1.16641138e+00 4.91098343e+00 1.16167337e+01 -6.40377486e+00
-4.44915880e+00 4.89451730e+00 1.39222511e+01 2.39825581e+00
1.09842336e+01 1.62243204e+01 1.68175170e+01 -3.16851665e+01
1.17984371e+00 -1.05535085e+01 5.23529489e+00 2.14209211e-01
1.08190563e+01 2.21035716e+00 1.50882382e+01 6.80606293e+00
5.50516834e+01 -2.30125523e+00 -4.63767391e-01 4.21293741e+00
2.68926423e+00 6.58892711e+00 1.22812706e+01 4.03343154e+00
3.39359379e+00 1.04462407e+01 -5.33619890e+00 -5.10175091e+00
-7.07683424e+00 4.14312618e+00 -1.25323370e+01 4.41161760e-02
2.21865805e+00 -4.58571335e+00 -5.00546667e+00 -1.72470362e+01
9.32024719e+00 -1.78378378e+01 2.16175949e+00 2.44962248e+01
1.83171321e+01 -3.21766562e+00 -3.26181408e+00 -1.45443304e-01
-5.29213620e+00 -1.56588009e+00 1.49610829e+01 1.40350948e+01
-2.65128620e-02 2.18210350e+01 -1.30672675e+01 1.87330126e+01
1.27768314e+01 7.06747682e+00 2.96140491e+00 1.15394839e+01
7.35122938e+00 8.03337710e+00 8.70993837e+00 1.75113086e+01
-4.71296934e+01 6.81252594e+00 -1.84380169e+01 1.96554482e+00
1.70513620e+00 1.41748988e+01 4.44130404e+00 1.45390134e+01
7.89478488e-01 5.25422719e+00 6.64275945e+00 -1.42927642e+01
1.38551058e+01 2.47075040e+01 2.57318340e+00 7.49696478e+00
1.06224905e+01 4.18335071e+00 -1.97396991e+00 1.16370769e+01
3.06250063e+01 1.99390853e+01 2.34597611e+00 6.07996215e+00
1.36669047e+00 3.51442488e+00 1.20811964e+01 6.97673767e+00
-1.08660148e+01 6.02294849e+00 5.92784726e+00 1.08003574e+01
6.88578786e+00 -1.12290862e+01 1.15078463e+01 7.03141265e+00
-2.02659911e+01 -3.61785059e-01 1.89429051e+01 -8.59119742e+00
3.31773465e+01 1.86832740e+00 7.29879090e+00 8.81032377e+00
6.23785452e+00 1.08441158e+01 1.11456540e+01 -3.29669458e+00
4.93131727e+00 -1.25587392e+01 9.52996596e+00 5.79688444e+00
6.58555391e+00 9.98003942e+00 8.42083596e+00 -2.41230769e+01
1.48103212e+01 9.64142629e-01 8.65287198e-01 3.12899106e+00
3.82102080e+00 -9.31700402e+00 5.10205991e-01 3.19052723e+00
-8.23055266e+00 6.07218809e+00 3.13099052e+00 -5.30526316e+00
1.06605376e+01 3.51562511e+00 -5.56439292e+00 1.03288203e+01
6.99370887e+00 -3.03446151e+00 4.98751528e-01 1.91607380e+01
3.42424545e+00 6.58550301e+00 5.37164261e+00 -1.66323624e+01
2.93833502e-01 1.74562005e+01 -2.32441907e+01 1.34259729e+01
1.69953198e+01 -7.65915784e+00 2.04327672e+01 -2.51065117e+01
6.74594290e+00 7.23276369e+00 1.54628331e+01 -9.89837787e+00
-5.14675032e+00 1.65379834e+01 4.00442430e+00 1.22382601e+01
4.37206985e+00 5.28482205e+00 1.40444235e+01 -1.39063419e+01
-2.79184886e+00 1.19707325e+01 -8.66449033e-01 -8.51393278e+00
-4.47981366e+01 -2.87447789e+00 -1.65079153e+00 5.94211823e+00
1.31293681e+01 -8.83367840e+00 1.36242259e+01 1.16814159e+00
1.56071797e+01 3.48039173e+01 1.30295476e+01 1.30331514e+00
2.32493691e+01 8.58409101e+00 9.23447905e+00 9.97191212e+00
-1.69482767e+01 8.21529352e+00 8.58382131e+00 -5.13655212e+00
1.46627305e+00 3.80418148e+00 -1.23711354e+01 -2.79797677e+00
8.06523941e+00 9.24824783e+00 1.73415223e+01 6.02880540e+00
1.00275192e+01 -1.44853861e+00 2.34595796e+01 2.19283001e+01
2.13104241e-01 6.27730254e+00 1.39253411e+01 -1.98662281e+00
5.53291227e+00 -2.30970711e-01 7.06118584e+00 -3.09881858e+01
-2.61010890e+00 8.54452902e+00 1.00097595e+00 2.94965413e+01
-1.44033722e+01 1.38340493e+00 7.69653360e+00 3.65691504e+00
1.99014739e+01 9.47476828e+00 1.10097290e+01 1.48877266e+01
-8.69891720e+00 9.34768280e+00 -1.15858794e+00 1.66788361e+01]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in Volatility [1.68715106 2.19788722 1.27364601 1.35767892 1.70116879 1.51649264
1.11697633 1.12418643 1.06848509 1.04829468 1.10696539 1.16333364
1.11260424 1.05205031 1.38450167 1.97432252 1.77343079 1.05326602
1.28379478 2.02292079 1.46003013 1.08926624 2.09306542 1.6302589
1.22225951 1.16580402 1.46038649 1.4809144 1.51165436 1.10503221
2.40540795 2.43516482 1.00776167 2.59206524 1.88633452 1.13287498
1.84717959 1.17152528 0.9000662 1.15590486 1.41868781 1.20452559
1.07767848 1.39443569 2.55955343 1.82599372 1.20166021 1.38668418
1.49887213 1.49176424 2.0587686 1.08946885 1.26198423 1.49355294
0.94484686 1.29785712 0.96019059 1.34723905 2.00082766 2.36818555
1.18923614 0.92902624 4.55981453 1.18547264 1.697942 1.58839801
0.93581223 0.89547081 1.55765486 1.32334793 2.47400169 1.47236379
1.03784428 2.29869617 1.38986704 1.36459176 3.05581758 1.55505705
1.62621905 1.52219355 1.33812318 1.96886384 1.48736738 1.75065524
2.69254622 0.88993126 1.44421868 1.57788128 1.55194552 1.15989696
1.38148979 1.19146592 1.18845397 1.6892346 1.81214432 1.44088445
1.07040583 1.33792353 1.50756854 1.15079692 1.09672735 1.2116427
2.92369768 1.40930182 1.07851567 1.0680017 1.08104026 0.92725976
1.40450816 1.94110389 1.30808151 1.05618603 2.36488263 1.23282885
1.11842457 1.4520483 1.52142989 1.21740068 1.66648249 1.35159454
1.0625527 1.57874658 1.18605933 1.15145372 1.41139553 1.32060613
1.34829666 3.79641004 1.23878468 1.14829451 0.90448658 1.76119277
1.77445407 1.78166051 2.17573838 1.23985802 2.07521591 2.02681808
0.93954369 1.39034206 1.49406049 1.57848271 1.34451442 1.17702685
1.66547459 1.52277794 1.34859719 1.9660615 1.58335507 1.33779254
1.91490726 1.34173146 1.28228628 2.3985804 1.14733221 1.56037194
1.10344894 3.40049106 2.37335889 1.07845549 1.01392243 1.59462774
1.18838275 1.61520603 1.08288064 1.46958571 1.15285453 1.22602189
1.3016302 1.13979942 1.30138165 1.12600938 1.14286861 1.58083869
1.21837263 1.73299014 1.84176674 1.13033734 1.22468825 0.87040465
3.13935157 0.88991256 2.07163924 1.20403738 1.56916714 1.60312963
1.42723709 1.51343434 1.44062152 0.90309766 1.11584212 1.5542353
1.53629044 1.45701312 1.6097448 1.09587575 1.17777601 1.16932806
1.64244991 1.4283592 1.921708 0.73316318 1.26879979 1.32154764
1.13865048 1.49247842 1.71840329 1.03222106 2.16414979 1.03416246
0.98269841 1.58594449 0.95900798 2.83067523 1.98937101 1.27846009
3.32538642 1.38038978 1.11537571 2.85118007 2.29930409 2.23082675
2.50943706 1.56325826 1.02337505 2.53605034 2.60594879 2.42152915
1.19849259 1.95202036 2.1688136 1.28156631 1.46061929 1.64129973
1.10458128 3.56017767 1.0663693 1.08937038 1.58951974 1.1328129
1.25961075 1.4395637 1.03980326 1.26834034 1.18066131 0.80535713
1.2387481 1.52898492 0.80605597 1.08689787 1.69475119 0.86145337
1.12053368 1.8759105 1.14342144 1.5330026 1.10905888 1.22746706
1.37958859 2.95429144 1.13123977 1.92575419 1.94596571 1.5565117
1.80234528 1.14237004 1.05880661 3.71299533 0.83982097 1.26623967
1.45693998 2.03078573 1.58011699 1.42648779 1.09196684 1.7738653
0.8950593 1.13554603 1.08085768 1.20381569 1.12644779 1.43793764
1.44464394 2.01739436 4.58004173 1.83502828 1.13816285 0.85944183
1.21780338 2.73065911 1.79726923 1.02296759 1.24775112 1.57834356
0.95936484 1.43110855 1.58671931 1.85413193 1.57924824 1.26347939
1.75882399 1.74760571 1.15790642 2.0486974 1.48234857 1.10284754
1.43029672 0.82640811 0.9493961 1.0348433 1.6269339 1.84570997
1.0197243 1.45401865 1.37947984 2.45653493 1.4449241 0.8425918
1.04461471 1.10303264 0.96977356 2.39780299 0.94036597 3.71955968
1.27305085 1.33806702 1.33191818 3.79478323 2.3979403 1.0150524
0.99101052 1.37006187 1.00723035 1.86668025 1.16631103 1.84514878
1.47887743 1.40420566 1.46817588 1.61028462]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in ROE [135 130 21 9 14 10 30 11 2 15 3 35 601 18 25 22 4 19
23 917 52 24 8 29 82 6 12 38 17 20 7 27 16 687 44 589
463 1 42 64 205 26 5 13 155 28 98 34 41 51 92 228 36 33
582 116 68 63 263 167 103 182 61 43 244 73 45 47 32 121 40 48
596 200 196 37 59 109 60 174 86 142]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in Cash Ratio [ 51 77 67 180 272 49 25 14 9 99 47 225 13 74 45 195 131 37
362 39 58 1 70 80 22 175 163 4 24 128 82 84 133 10 53 12
36 20 333 38 0 27 237 48 43 3 182 52 11 8 31 60 26 79
271 2 15 164 201 44 257 94 29 35 958 5 18 81 73 190 496 148
33 121 30 16 189 92 103 41 40 162 317 130 23 108 7 42 54 61
46 260 183 17 136 568 62 71 57 117 6 198 65 21 147 64 34 110
184 68 129 19 116 88 212 115 126 56 127 221 425 83 459 100]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in Net Cash Flow [ -604000000 51000000 938000000 -240840000 315120000
-189000000 90885000 287000000 13900000 -308000000
-129000000 21818000 -30351000 166000000 50823000
-2276034000 -34000000 -162000000 -90800000 66000000
1795000000 3390000 13200000 413000000 -281000000
7194000 1333000000 -1300000 -38200000 10000000
698000000 -6430000000 768300000 42000000 -3025000000
-108953000 218000000 22000000 474000000 -431000000
20764000000 -712000000 1386000000 -9600000 584000000
148900000 -433000000 32600000 -3186000000 -268000000
-220100000 -1039361000 -11208000000 -881000000 1120000000
-200481000 3190000 1064000000 758700000 -1710600000
-191000000 -93000000 -3283000000 39289000 2000000
548000000 -47000000 -119000000 76000000 326500000
-171460000 -590000000 39000000 150000000 781000000
-20440000 -8796000 -41000000 -2000000 115100000
108369000 -22000000 -1763000000 228529000 289000000
-116000000 -1610000000 375200000 2288000000 -59000000
-2214800000 848000000 23000000 -325000000 22239000
46300000 -319396000 683000000 -1179000000 533875000
830000000 -4496000000 -116800000 249000000 -35000000
29000000 79000000 -1368707000 1617921000 2196000
523803000 -14756000 4073000 450000000 -513000000
-71065000 64600000 4624000000 -119311000 273599000
28136000 3515000000 14523000 592000000 46600000
-240000000 46000000 194800000 -19000000 -58589000
-43239000 -83906000 -30900000 -26905000 -355228000
254000000 -1603000000 -15576000 2824000000 -809000000
-3857000000 73901000 -363198000 -685000000 63492000
7786000000 83583000 -373409000 175000000 -112818000
162690000 272000000 49000000 -184471000 -1504000000
7523000000 2300000000 13065000 -17388000 -445000000
-28325000 636000000 -790000000 -193542000 -296585000
12747000000 -831000000 -157700000 2448000 114300000
-900000000 412000000 -395000 -271788000 -218700000
-7341000000 2212000 -170000000 -86000000 -1649000000
-211400000 -79600000 -123369000 136400000 -27208000
-235000000 -205200000 -356000000 -51100000 -638127000
301000000 274000000 -107000000 610000000 10906000
1603000 -8000000 85000000 -78836000 5607600000
537900000 239000000 1944000000 -16185000 403700000
35300000 59758000 -584000000 -99000000 1805094000
-952000000 -1098300000 -367000000 1083000000 -1177000000
-5317000 13624000 -910125000 1010500000 151000000
-155000000 -126000000 -6000000 379000000 695722000
-9000000 84000000 -1456000000 128000000 3394000000
915325000 75400000 36442000 -75150000 217100000
-134259000 -588000000 -298400000 -403561000 278800000
-28000000 -1671386000 2962000000 298000000 700900000
160383000 116000000 -533785000 1735000000 -295000000
15900000 31884000 625000000 -563000000 2694000000
-2133000000 -61744000 21000000 -808000000 10853000
-67676000 -62542000 168081000 23000 -42800000
615000000 73000000 165012000 -26010000 -654720000
694000000 88852000 -1016000000 33398000 -167000000
-2630000000 -648000000 237800000 -38000000 497000000
1584000000 -3482000000 -150300000 5000000 10716000
-4636000 -891400000 159000000 6000000 12679000
250000000 -58000000 100145000 -199000000 -463323000
1004000000 -8482000 29159000 3428000000 10400000
-195000000 439000000 1630000000 -3800000 425000000
142787000 637230000 98989000 37051000 89509000
-1803000 -6128000000 65488000 -12100000 -460000000
-254000000 -1268000000 -140000000 -467300000 -568000000
-12000000 -102075000 373520000 5332000 734422000
-911000000 133000000 -43000000 17000000 -1032187000
376000000 -43623000]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in Net Income [ 7610000000 5144000000 4423000000 629551000 696878000
1849000000 596541000 636000000 2052300000 2533000000
2196000000 248710000 141555000 356800000 321406000
334906000 848000000 2171000000 153900000 144000000
1377000000 590859000 516000000 6939000000 1562000000
685074000 596000000 442600000 2560000000 1385000000
-23528000000 -6692000000 763500000 -322000000 892000000
741733000 1364000000 476000000 5163000000 5176000000
15888000000 968000000 2084000000 135400000 -1967000000
3547000000 3158000000 280900000 1565000000 -239000000
609700000 583106000 17242000000 2512000000 2834000000
547132000 1520992000 1757000000 1602000000 664900000
840000000 410400000 -14685000000 509699000 -271000000
2094000000 634000000 1384000000 521000000 1247000000
475602000 1399000000 537000000 355000000 -692000000
4050000000 -113891000 203523000 1968000000 878000000
1623600000 319361000 5237000000 4587000000 65900000
1899000000 4526000000 1953000000 1940000000 2297000000
709000000 3357400000 8382000000 1034000000 1450000000
296689000 168800000 869829000 764000000 2816000000
269732000 -14454000000 1725000000 1002100000 1193000000
429100000 1117000000 -4524515000 187774000 870120000
85171000 878485000 232120000 268000000 1979000000
-156734000 494900000 2269000000 457223000 764465000
394950000 7373000000 516361000 3669000000 315000000
-12156000000 578000000 650800000 712000000 241686000
412512000 267669000 489000000 210219000 546421000
-196000000 2965000000 1374561000 18108000000 1339000000
9687000000 705672000 456227000 307000000 768996000
-671000000 451838000 692957000 2129000000 849073000
-559235000 -3056000000 1682000000 752207000 4768000000
2461000000 4554000000 686088000 479058000 558000000
512951000 1276000000 13190000000 192078000 419247000
11420000000 938000000 454600000 123241000 588800000
968100000 427235000 302971000 633700000 24442000000
894115000 1013000000 253000000 7351000000 483500000
329200000 802894000 436900000 423223000 -240000000
2408400000 3605000000 388400000 252111000 2181000000
3433000000 4476000000 3808000000 350745000 487562000
859000000 369416000 4529300000 941300000 7267000000
5310000000 615302000 653500000 401600000 288792000
1599000000 4833000000 546733000 5241000000 1000400000
2852000000 4442000000 -2204000000 1079667000 352820000
-2270833000 847600000 997000000 -2441000000 428000000
2752000000 220000000 122641000 -3362000000 570000000
-769000000 1556000000 973800000 357659000 350000000
283766000 244977000 1093900000 931216000 -7829000000
260100000 407943000 1604000000 888000000 2551360000
1679000000 5452000000 6960000000 1234000000 636056000
1267600000 494090000 6873000000 4106000000 -76400000
437257000 1406000000 682000000 5642000000 4227000000
321824000 1547000000 1228000000 304768000 665783000
357796000 696067000 -713685000 749900000 746000000
1447000000 196000000 225400000 1053849000 284084000
606828000 2421000000 2139375000 1156000000 267046000
1350000000 1933000000 1980000000 798300000 -4556000000
2214000000 1439000000 13345000000 359500000 -214000000
459522000 527100000 1975400000 198000000 3439000000
410395000 1220000000 1540000000 369041000 2986000000
232573000 7340000000 340383000 680528000 5813000000
867100000 4772000000 4844000000 7608000000 411500000
3990000000 221177000 760434000 507577000 375236000
-556334000 419222000 17879000000 469053000 640300000
22894000000 783000000 753000000 -571000000 837800000
506000000 612000000 195290000 -2408948000 984485000
1201560000 16150000000 251200000 474000000 340000000
-4359082000 1293000000 147000000 309471000 339000000]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in Earnings Per Share [ 11.39 3.15 2.94 1.26 0.31 2.99 8.91 2.6 3.13
5.88 1.69 1.52 2.08 2.07 1.8 3.01 6.61 5.12
1.6 0.68 1.13 2.46 9.49 9.15 8.6 1.42 1.28
3.93 9.73 4.93 -61.2 -13.18 2.47 -0.31 1.21 5.54
2.86 2.66 3.9 7.52 4.18 1.78 2.59 -4.49 15.38
2.73 2.05 0.94 -0.18 2.72 3.79 5.41 3.54 8.71
1.64 4.44 2.26 2.02 2.97 1.55 -22.43 3.52 -2.43
8.17 3.87 1.53 2.93 3.71 15.3 7.86 1.9 -1.61
7.15 -0.28 1.79 2. 1.58 2.67 2.01 4.66 0.54
3.21 5.68 2.17 4.03 5.14 4.92 4.81 4.95 5.08
1.56 4.68 5.52 4. 4.05 1.27 -35.55 1.43 3.38
4.07 3.61 5.71 -8.29 3.25 2.37 0.56 2.77 3.5
0.92 4.25 -0.99 2.3 2.685 2.42 5.87 1.86 1.77
1.31 1.97 -11.31 1.37 2.22 3.04 1.73 2.85 3.66
5.42 -0.29 9.23 12.37 1.02 6.11 4.65 2.39 1.14
11.69 -0.79 0.82 2.35 -1.21 -10.78 3.28 5.78 0.22
8.54 13.48 5.19 2.41 2.25 1.11 0.58 15.87 5.16
3.69 1.62 6.05 2.78 0.1 4.41 2.31 5.03 3.03
-2.97 2.27 11.62 3.36 0.74 3.3 9.71 9.62 3.08
3.22 1.03 1.08 4.82 4.7 4.49 4.61 3.14 4.31
7.72 2.79 5.29 -3.26 7.22 12.75 -13.03 -6.07 2.56
0.43 0.29 -21.18 -1.99 5.13 1.3 1.09 1.17 4.43
9.32 -10.23 0.86 2.04 4.52 1.81 50.09 3.32 4.11
2.16 1.38 4.42 -0.42 3.94 5.18 1.01 7.78 1.59
5.39 1. 5.75 6.17 6.92 -4.29 2.14 5.22 1.04
1.63 11.38 4.26 3.02 5.43 3.62 4.53 4.21 3.82
1.94 -1.53 4.96 10.99 12.5 1.98 19.52 6.89 6.1
3.51 5.51 5.38 8.72 4.13 8. 1.66 3.07 3.29
-2.31 4.38 5.7 2.36 9.95 -0.76 0.89 1.93 -25.92
4.22 3.85 0.42 1.88 -4.64 0.78 1.2 ]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in Estimated Shares Outstanding [6.68129938e+08 1.63301587e+09 1.50442177e+09 4.99643651e+08
2.24799355e+09 6.18394649e+08 6.69518519e+07 2.44615385e+08
4.21897810e+08 4.30782313e+08 1.29940828e+09 1.63625000e+08
6.80552885e+07 1.72367150e+08 1.78558889e+08 1.11264452e+08
1.28290469e+08 4.24023438e+08 9.61875000e+07 2.11764706e+08
1.21858407e+09 2.40186585e+08 5.43730242e+07 7.58360656e+08
1.81627907e+08 4.82446479e+08 4.65625000e+08 1.12620865e+08
2.63103803e+08 2.80933063e+08 3.84444444e+08 5.07738998e+08
3.09109312e+08 1.03870968e+09 7.37190083e+08 1.33886823e+08
1.04405594e+09 1.78947368e+08 5.06660363e+08 6.88297872e+08
8.45069512e+08 5.43820225e+08 8.04633205e+08 7.52222222e+07
4.38084632e+08 2.30624187e+08 1.15677656e+09 1.37024390e+08
1.66489362e+09 1.32777778e+09 2.24154412e+08 1.53853826e+08
3.18706100e+09 7.09604520e+08 3.25373134e+08 3.33617073e+08
3.42565766e+08 7.77433628e+08 7.93069307e+08 2.23872054e+08
5.41935484e+08 1.31118211e+08 6.54703522e+08 1.44800852e+08
1.11522634e+08 2.56303550e+08 1.63824289e+08 9.04575163e+08
1.77815700e+08 3.36118598e+08 3.10850980e+07 1.77989822e+08
2.82631579e+08 1.18729097e+08 4.29813665e+08 5.66433566e+08
4.06753571e+08 1.40335196e+08 9.84000000e+08 5.55696202e+08
6.08089888e+08 1.58886070e+08 1.12381974e+09 1.86463415e+09
1.22037037e+08 5.91588785e+08 7.96830986e+08 9.00000000e+08
5.32235888e+08 4.46887160e+08 1.44105691e+08 6.98004158e+08
1.69333333e+09 2.85433071e+08 1.90185256e+08 3.60683761e+07
1.57577717e+08 1.91000000e+08 6.95308642e+08 2.12387402e+08
4.06582278e+08 1.20629371e+09 2.96479290e+08 2.93120393e+08
1.18864266e+08 3.56869010e+08 1.48511384e+08 5.45779855e+08
5.77766154e+07 3.67139240e+08 1.52091071e+08 3.17142599e+08
6.63200000e+07 2.91304348e+08 4.65647059e+08 1.58317172e+08
2.15173913e+08 2.99887089e+08 1.88935124e+08 1.30232538e+08
2.49968354e+08 3.96397850e+09 2.91729378e+08 2.80076336e+09
1.59898477e+08 1.07480106e+09 2.93153153e+08 2.34210526e+08
1.39702890e+08 1.44741053e+08 1.33168657e+08 1.33606557e+08
6.91509868e+07 1.00815683e+08 6.75862069e+08 3.21235103e+08
1.58299351e+08 1.46386419e+09 1.31274510e+09 1.58543372e+09
1.51757419e+08 1.90889958e+08 2.69298246e+08 6.57823781e+07
8.49367089e+08 1.25162881e+08 4.14202335e+08 3.61307660e+08
4.62177686e+08 2.83487941e+08 4.15308642e+08 2.02751213e+08
7.80360066e+08 5.13987730e+08 1.99237805e+08 8.28820069e+07
2.53636364e+09 1.49414520e+08 9.78486647e+08 9.27913043e+07
8.07797688e+07 4.73858921e+09 4.16888889e+08 4.09549550e+08
2.12484483e+08 3.71014493e+07 3.68023256e+08 4.28362832e+08
1.15781843e+08 1.25194628e+08 3.91172840e+08 4.04000000e+09
4.44833333e+08 3.64388489e+08 2.53000000e+09 4.34970414e+09
1.09637188e+08 1.42510822e+08 2.07466150e+08 1.78246546e+08
2.13598256e+08 8.08080808e+07 1.06096916e+09 3.10240964e+08
1.15595238e+08 3.40690540e+08 6.60909091e+08 3.53553038e+08
4.65280665e+08 1.13333333e+09 7.95340136e+07 2.66770186e+08
3.44660194e+08 3.42051852e+08 9.39688797e+08 2.00276596e+08
1.61848552e+09 1.15184382e+09 2.37568340e+08 1.27898089e+08
6.70051044e+07 5.31229236e+08 6.26036269e+08 1.46954178e+09
1.96292135e+09 3.58566308e+08 5.39130435e+08 2.81139240e+09
6.76073620e+08 1.49538366e+08 2.76721569e+07 1.74277283e+08
3.74812030e+08 4.02141680e+08 1.67187500e+08 4.50409165e+08
5.11627907e+08 4.22900000e+08 1.58734655e+08 3.67741936e+08
3.86432161e+08 3.03313840e+08 2.41637717e+08 3.22215315e+08
2.69230769e+08 2.60335780e+08 2.09382051e+08 2.46930023e+08
9.99158798e+07 7.65298143e+08 3.02441860e+08 1.99972059e+08
3.54867257e+08 4.90607735e+08 5.09355161e+07 5.05722892e+08
6.15929204e+09 3.00243309e+08 4.91391569e+08 5.86851852e+08
3.58036232e+08 1.55497738e+09 5.46010638e+08 1.81904762e+08
1.10978934e+08 2.71428571e+08 6.75247525e+08 4.56103476e+08
5.43316195e+08 2.02405031e+08 2.87012987e+08 1.22800000e+09
5.30031304e+07 2.19730363e+08 1.03088493e+08 1.31542647e+08
1.00587717e+08 1.66360140e+08 3.50420561e+08 1.42911877e+08
1.39134615e+09 1.38282209e+08 9.26053603e+07 2.78513726e+08
1.29664103e+08 9.31153846e+08 3.63839286e+08 2.71361502e+08
8.84258278e+07 2.48618784e+08 5.33977901e+08 4.37086093e+08
1.89619952e+08 8.32330827e+08 3.76701571e+08 5.63080169e+09
1.85309278e+08 1.39869281e+08 2.25255882e+08 1.25201900e+08
3.98266129e+08 1.43478261e+08 3.12920837e+08 1.35443894e+08
1.23200000e+08 1.86384343e+08 3.76024590e+08 2.61833077e+08
9.87703919e+07 9.52950820e+08 2.47037037e+08 8.66061706e+08
9.00371747e+08 8.72477064e+08 9.96368039e+07 4.98750000e+08
1.33239157e+08 2.10646537e+08 1.65334528e+08 1.14053495e+08
2.40837229e+08 3.32715873e+08 4.08196347e+09 8.22900000e+07
2.71313559e+08 5.47703349e+09 7.86934673e+07 4.53614458e+08
7.51315790e+08 5.68539326e+08 1.18146718e+08 1.01186528e+08
9.29378086e+07 5.07466495e+08 2.84729858e+08 4.19480520e+09
1.12857143e+09 1.80851064e+08 9.39457328e+08 4.35353535e+08
1.88461538e+08 2.57892500e+08 4.98529412e+08]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in P/E Ratio [ 3.71817366 18.80634984 15.2755102 74.55555714 178.4516129
12.26755853 31.04040483 16.62692308 18.45654341 10.18707517
36.66863964 26.33552566 38.72115433 19.77777729 29.23888944
18.60797276 12.18003056 12.12695313 41.19999875 280.5147059
16.52212389 21.78455285 16.83456217 17.74098383 12.37441837
68.27464577 528.0390742 15.18066158 14.33093546 18.70385375
93.0892875 21.14574899 18.68760706 31.99173471 33.236463
18.24940665 22.46240602 10.26350566 19.22739309 13.00478493
21.43258539 14.59845598 105.2444456 19.91872601 15.09890147
35.47805024 73.18085213 31.46896164 15.89338235 33.6517153
9.56561922 19.19773983 13.41561401 21.08536707 19.47072005
24.10619469 59.2871297 13.74074108 16.89677484 13.55910495
28.40792888 17.61931818 20.81987609 17.91064896 15.28940517
43.54248562 14.27645119 24.42048464 31.36274549 11.19720127
18.98947474 22.01003278 17.31307587 10.0951049 33.99441229
12.9750005 15.92405063 22.47940075 37.63681692 20.98068605
36.56910528 171.9629648 21.07165078 8.9242956 30.69124332
14.84223297 10.43190642 14.45934939 14.63970579 21.22828323
16.87598484 48.4743609 22.20726496 11.10688424 23.29999925
17.62716025 54.88976299 19.21678322 33.8402358 15.79115405
30.85041634 18.91693259 11.82311769 93.046152 34.42615865
18.4368231 68.40285829 32.21739022 12.24470612 34.33913174
18.63636281 21.17546899 55.82911329 7.57526882 23.06214689
79.89313282 28.17258883 22.81195132 23.16058394 27.2972964
30.08552599 16.22543353 16.5684214 20.93532438 10.6912571
48.0592125 12.17527638 14.51898734 14.88190693 47.36697339
8.18027502 17.92156961 5.56628445 18.47096753 15.55230042
28.65789298 17.33019641 18.65928006 13.48780488 13.15758696
28.94893574 10.73086395 12.23450108 16.95090016 25.30952381
24.07012104 27.36851246 69.72727273 20.90280972 10.20919844
35.22705217 23.05202293 14.29460622 16.755556 20.97297387
46.56896552 34.41461708 17.96124031 14.81415929 19.88075908
17.33471116 17.03703704 10.91404942 13.16417861 45.79136799
149.2 25.42011775 16.93197234 18.19047619 12.63824289
15.90093725 37.11894361 9.29315491 23.49999865 13.04848515
5.59835232 9.03326424 28.97619077 20.59183628 26.1980526
27.47572718 25.15740741 24.51037324 21.34893532 9.98663697
10.45770043 73.12355174 27.24840701 31.68909559 18.42192724
19.51295324 21.80149775 9.88888889 9.79962193 33.43037975
16.78393352 26.59843176 4.30451128 22.72265547 17.00327316
41.8372093 394.4137828 30.06451484 16.48927797 17.88833648
36.30630541 33.90769385 21.07692308 17.07900767 27.19098691
18.77906977 10.12254902 10.48672611 29.38673978 25.45318329
11.65361416 26.93261402 28.5663708 10.94403893 14.72222176
12.91304348 19.88914118 12.67420186 14.57922079 16.36548299
19.07722008 33.7920802 6.58124527 10.5141392 12.73584906
18.99814508 36.200001 9.88347861 33.40263993 87.98541248
17.33088199 27.42629957 20.5560757 11.58812299 31.66346154
82.55172759 27.36196196 110.7647088 11.79700833 17.99615423
33.06802755 23.14084554 39.93377417 11.83425414 14.64900684
11.43233083 24.32984346 48.4123701 12.50980392 13.57719715
28.59879153 61.77536232 10.26933585 28.21782178 8.42960024
25.15151465 19.16433601 2.93545077 28.9 17.34252511
19.28524574 9.48433077 14.19237695 17.88661766 11.01720183
19.56416538 8.83874987 57.21084398 27.68975042 25.0423443
26.55319179 39.60292786 44.78571429 10.55251164 23.61052667
21.74152585 14.76080352 32.15060181 10.98773006 33.68539326
14.02509691 35.84974197 18.51030928 9.28436019 20.24675247
19.41489362 17.68221394 131.5256359 22.74999917 70.47058529]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique values in P/B Ratio [-8.78421945e+00 -8.75006804e+00 -3.94171377e-01 4.19965109e+00
1.05980998e+00 7.49683072e+00 1.29064585e+02 -7.19496855e-01
-3.02264878e+00 -1.88391201e+00 -4.32713829e+00 -1.26933216e+00
-4.07261517e+00 -9.85570628e+00 4.28235752e+00 -1.36497235e+01
-1.11465802e+00 -8.77452891e-01 -1.41713889e+01 3.85775599e+00
-4.49034237e+00 -3.10153798e+01 2.40123217e+01 -1.33983799e+01
-2.08135771e+01 3.90442953e+00 -7.97010393e+00 -3.10067734e+01
-7.75985560e+00 4.97080925e+00 -1.28609384e+01 8.20292338e+00
2.63981367e+00 2.90291480e-01 -3.08947652e+00 3.95497461e+00
-4.89529412e+00 -6.09073508e-01 2.20326121e+01 -9.38006774e-01
8.63704546e+00 -8.52562380e-01 -4.12776957e+00 1.34905440e+01
1.62602199e-01 -3.32129829e+00 -3.89565682e+00 5.88025559e-01
-3.88092050e+00 1.04481548e+00 -1.16753335e+00 6.26405255e+00
-1.74661009e+01 -3.41530183e+00 -1.06666788e+01 -7.47716562e+00
-4.32005119e+00 -3.93528350e-01 -6.30960570e-02 -9.42813353e+00
-1.84052775e+00 1.11780419e+00 -7.61190775e+01 -8.80528127e+00
-5.48323699e-01 -5.86495365e+01 1.72013290e+01 2.14394282e+01
6.36871510e-02 -1.30549296e+00 5.16502890e-01 -7.25643348e-01
-2.23147395e-01 8.55095541e-01 9.02439024e-01 -1.33832119e+01
7.12164449e+00 -1.76501314e+00 -7.01980905e+00 4.76393721e+00
5.67399059e+00 -7.60494471e+00 -1.67300221e+01 3.25222222e+00
6.27728687e+00 -3.75933827e-01 -4.55221439e+00 -1.37592304e+01
-3.98503937e+00 -6.62151724e-01 -7.48931346e+00 -1.18774408e+01
-2.29343975e+00 -1.27172775e+01 -4.42681108e+00 1.96252695e+00
1.78561644e+00 4.60169855e+00 -1.49288674e+01 -8.11682125e+00
-6.36928380e+00 -1.23088208e+01 1.41624318e+00 2.38567280e+01
9.56795153e+00 -1.16983338e+00 -5.97313433e-01 -8.63959070e+00
6.17402389e+00 6.34974742e+00 -1.71588003e+00 5.99145874e+00
-4.41034942e+01 -1.41514453e+01 5.10875627e+00 4.42742519e+00
5.88446716e+00 -2.10070794e+00 2.93542695e+00 -6.07256055e+00
-1.90866103e+01 -7.97573034e+00 4.01471293e+00 1.49926228e+01
6.74676025e+00 5.10154601e+00 -3.97339544e+00 2.25637911e+01
1.04977041e+01 4.24299831e+00 3.15944610e+00 3.61761016e+00
-4.89203675e+00 7.20524245e+00 5.76005679e+00 3.83589577e+00
1.21128792e+01 1.73458569e+01 1.21453261e+01 -6.50573700e-02
-7.27905120e+00 -3.73804696e+00 6.26481675e+00 6.06938909e+00
-3.70982592e+00 -1.98048307e+00 5.92567697e+00 -1.13548387e-01
4.85239120e+00 -9.81083310e-01 5.04769952e+00 6.12393390e+00
2.65657721e-01 -2.76365122e+00 4.26075000e+01 7.58647709e+00
4.21861998e+00 2.82384519e+00 6.29494262e+00 2.75223607e+00
-1.88688119e+00 -2.53301086e+00 -1.89407115e+00 -1.46630663e+00
-2.01209100e+00 2.93100547e+00 -3.07831974e-01 -1.29484372e+00
-8.57290222e-01 1.42807500e+01 -6.51102807e-01 -1.08528544e+01
-4.60659114e+00 1.98214162e+01 -5.11719395e+00 -2.24577338e+00
1.03163539e+01 3.45176471e+00 -1.23701979e+01 2.21957746e+00
4.53525099e+00 7.12214514e+00 8.61558483e+00 -1.28095060e+01
-3.98031573e+00 6.49575517e+00 3.05088697e+00 -1.95019387e+00
2.02384440e+00 -5.19073368e+00 -6.63297081e+00 5.84661735e+00
5.79822581e+00 -4.28293111e+00 1.27352995e+00 4.40399354e+00
-1.29800623e+00 -4.21330891e+00 -1.88094283e+00 1.17122901e+00
-1.17173832e+01 -7.35331395e+00 6.97186364e+00 -5.70016789e+00
-1.38596074e-01 -1.23755263e+01 9.58253576e+00 9.26433162e-01
1.11681066e+01 -2.07554286e+00 -8.02511003e+00 -1.04640982e+01
-3.64026219e-01 3.34510155e+00 -4.26858900e-01 -7.33207433e-01
6.29052120e+00 -1.12105856e+00 -1.05242872e+00 -3.61858249e-01
-4.52699514e+00 -2.25674696e+00 -8.43313348e-01 -1.41802706e+00
-6.94125670e-01 -6.57486911e+00 -6.08922771e+00 -5.93157895e-01
-2.82711144e+00 -4.17892734e+00 7.02905607e+00 4.29189430e+00
5.74886878e-01 5.43403909e+00 -1.20208938e+01 -1.57274805e+01
2.04089995e+01 4.08947221e+00 -1.62154690e+01 5.25089724e-01
-2.42822510e+00 -4.01646113e+00 -1.30089841e-01 -2.58040816e+00
-2.71690772e+00 2.82536561e+00 -7.96157903e+00 -2.79545642e+00
-1.88641943e+01 -8.54722222e+00 -2.48137610e+00 -4.04496970e+00
7.41377678e+00 -8.42213189e-01 7.02678249e+00 -2.35373233e+01
-2.53851293e+01 4.06808411e+00 -1.27265533e+01 -2.80325119e+01
2.62757576e+00 -8.91599302e-01 6.01095386e+00 4.59415584e+00
-2.34739137e+00 2.76805090e+00 -2.31952916e+01 6.25590309e+00
1.06689857e+00 1.06955822e+00 -1.31980547e+01 9.47139976e+00
1.52621554e+01 -2.66190517e-01 -1.08191192e+00 -1.36174399e+01
4.07654319e+00 2.55967070e+00 -4.04075101e+00 -2.63806868e+01
2.95471503e+01 -1.85099485e+00 -4.50863346e+01 -1.41529880e+00
-1.45611208e+01 -8.04377178e+00 2.28480237e+00 -1.02499673e+01
1.26957118e+01 7.18612812e+00 -2.26192667e+00 -7.76267729e+00
-2.70644272e+00 -2.95949367e-01 4.13047059e+00 6.26177457e+00
-3.83825986e+00 -2.38844490e+01 1.72306785e+00]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
No issues when checking unique values.
#Plotting boxplot and histogram for numeric data
def histogram_boxplot(data, feature, figsize=(12, 7), kde=True, bins=None):
"""
data: df
feature: numeric column
"""
f2, (ax_box2, ax_hist2) = plt.subplots(
nrows=2,
sharex=True, # same x-axis
gridspec_kw={"height_ratios": (0.25, 0.75)},
figsize=figsize,
)
sns.boxplot(
data=data, x=feature, ax=ax_box2, showmeans=True, color="lightgreen"
)
sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
) if bins else sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2
) # For histogram
ax_hist2.axvline(
data[feature].mean(), color="green", linestyle="--"
) # Add mean to the histogram
ax_hist2.axvline(
data[feature].median(), color="black", linestyle="-"
) # Add median to the histogram
#Barplot for categorical columns
def labeled_barplot(data, feature, perc=False, n=None):
"""
data: dataframe
feature: dataframe column
"""
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 1, 5))
else:
plt.figure(figsize=(n + 1, 5))
plt.xticks(rotation=90, fontsize=15)
ax = sns.countplot(
data=data,
x=feature,
palette="Paired",
order=data[feature].value_counts().index[:n].sort_values(),
)
for p in ax.patches:
if perc:
label = "{:.1f}%".format(
100 * p.get_height() / total
) # percentage of each class of the category
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
# annotate the percentage
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
)
plt.show()
df['Ticker Symbol'].value_counts()
AAL 1
NEE 1
NUE 1
NTRS 1
NSC 1
..
EQR 1
EQIX 1
EOG 1
EMN 1
ZTS 1
Name: Ticker Symbol, Length: 340, dtype: int64
All values are distinct.
df['Security'].value_counts()
American Airlines Group 1
NextEra Energy 1
Nucor Corp. 1
Northern Trust Corp. 1
Norfolk Southern Corp. 1
..
Equity Residential 1
Equinix 1
EOG Resources 1
Eastman Chemical 1
Zoetis 1
Name: Security, Length: 340, dtype: int64
All values are distinct.
df['GICS Sector'].value_counts()
Industrials 53 Financials 49 Health Care 40 Consumer Discretionary 40 Information Technology 33 Energy 30 Real Estate 27 Utilities 24 Materials 20 Consumer Staples 19 Telecommunications Services 5 Name: GICS Sector, dtype: int64
labeled_barplot(df, 'GICS Sector')
The labels are assigned for specific economic sector assigned to a company. The majority are Industrials, Financial, Healthcare and Consumer Discretionary. Telecommunications Services represent the minority in the data set.
df['GICS Sub Industry'].value_counts()
Oil & Gas Exploration & Production 16
REITs 14
Industrial Conglomerates 14
Electric Utilities 12
Internet Software & Services 12
..
Technology Hardware, Storage & Peripherals 1
Real Estate Services 1
Trucking 1
Networking Equipment 1
Casinos & Gaming 1
Name: GICS Sub Industry, Length: 104, dtype: int64
labeled_barplot(df, 'GICS Sub Industry')
Labels assigned for the specific sub-industry group. There is a variety of sub-industries with the highest values belonging to Oil and Gas Exploration and Production, REITs and Industrial Conglomerates.
histogram_boxplot(df, 'Current Price')
The distribution for current price is right skewed with the data centered around 59.71. There are outliers present on the right side.
df['Current Price'].median()
59.705
df['Current Price'].max()
1274.949951
The highest value for current price is approximately 1274.
histogram_boxplot(df, 'Price Change')
The distribution for price change is slightly left skewed, but still resembles a normal distribution with outliers present on both sides.
histogram_boxplot(df, 'Volatility')
The volatility represents the standard deviation of the stock price over the last 13 weeks. The distribution for volatility is right skewed with outliers. The data is centered around 1.4.
histogram_boxplot(df, 'ROE')
ROE is a measure of financial performance. The distribution for ROE is right skewed with outliers present. The data is centered around 15.
histogram_boxplot(df, 'Cash Ratio')
Cash Ratio represent a company's cash to liabilities. The distribution is right skewed with outliers present. The data is centered around 47.
histogram_boxplot(df, 'Net Cash Flow')
df['Net Cash Flow'].median()
2098000.0
Net Cash Flow describes difference from cash in flow and outflow in dollars. The distribution represents a normal distribution with outliers present on both sides. The data is centered around 2098000.
histogram_boxplot(df, 'Net Income')
df['Net Income'].median()
707336000.0
Net income represents revenue after expenses. interest and taxes in dollars. The distribution is right skewed with outliers present on both sides. The data is centered around 707336000.
histogram_boxplot(df, 'Earnings Per Share')
df['Earnings Per Share'].median()
2.895
Earnings per share represents the company's net income divided by the total common shares in dollars. The distribution for Earnings per share represents a normal distribution with outliers present on both sides. The data is centered around 2.9.
histogram_boxplot(df, 'Estimated Shares Outstanding')
df['Estimated Shares Outstanding'].median()
309675137.79999995
The distribution for estimated shares outstanding is heavily right skewed with outliers present. The data is centered around 309675137.
histogram_boxplot(df, 'P/E Ratio')
df['P/E Ratio'].median()
20.81987609
P/E ratio represnts companys stock price to earnings per share. The distribution for P/E ratio is right skewed with outliers present. The data is centered around 20.8.
histogram_boxplot(df, 'P/B Ratio')
df['P/B Ratio'].median()
-1.0671703205
P/B ratio represents companys stock price per book value. The distribution for P/B ratio represents a normal distribution with outliers present on both sides. The data is centered around -1.1.
num_cols = df.select_dtypes(include=np.number).columns.tolist()
plt.figure(figsize=(10,5))
sns.heatmap(df[num_cols].corr(), annot=True)
<Axes: >
sns.pairplot(data=df[num_cols], diag_kind="kde")
plt.show()
Volatility seems to be negatively correlated with price change. Earnings per share seems to be correlated with current price and net income.
Questions:
histogram_boxplot(df, 'Current Price')
The distribution for Current price (of Stock in dollars) is right skewed with outliers present on the right. The data is centered around 59.71 with the max value being approximately 1274.95.
q2 = df.groupby('GICS Sector')['Price Change'].mean()
q2
GICS Sector Consumer Discretionary 5.846093 Consumer Staples 8.684750 Energy -10.228289 Financials 3.865406 Health Care 9.585652 Industrials 2.833127 Information Technology 7.217476 Materials 5.589738 Real Estate 6.205548 Telecommunications Services 6.956980 Utilities 0.803657 Name: Price Change, dtype: float64
plt.figure(figsize= (10,5))
sns.boxplot(df, x= 'GICS Sector', y='Price Change')
plt.xticks(rotation=90);
Healthcare, Consumer Staples and IT are the top three sectors in price change on average respectively. Healthcare and IT do have a wider distribution compared to other sectors. Energy has the highest variance and is also the lowest in terms of price change.
plt.figure(figsize=(10,5))
sns.heatmap(df[num_cols].corr(), annot=True)
<Axes: >
A heatmap of the numerical columns has been created to find any correlations between variables. Earnings per share is slightly correlated with Current price and Net income. Estimated shares outstanding and Net income are also slightly correlated.
q4 = df.groupby('GICS Sector')['Cash Ratio'].mean()
q4
GICS Sector Consumer Discretionary 49.575000 Consumer Staples 70.947368 Energy 51.133333 Financials 98.591837 Health Care 103.775000 Industrials 36.188679 Information Technology 149.818182 Materials 41.700000 Real Estate 50.111111 Telecommunications Services 117.000000 Utilities 13.625000 Name: Cash Ratio, dtype: float64
plt.figure(figsize= (10,5))
sns.boxplot(df, x= 'GICS Sector', y='Cash Ratio')
plt.xticks(rotation=90);
IT, Telecommunication Service and Healthcare have the highest Cash Ratio on average respectively. IT and Healthcare, both have the highest variance compared to other sectors. The utilities sector has the lowest average cash ratio.
q5 = df.groupby('GICS Sector')['P/E Ratio'].mean()
q5
GICS Sector Consumer Discretionary 35.211613 Consumer Staples 25.521195 Energy 72.897709 Financials 16.023151 Health Care 41.135272 Industrials 18.259380 Information Technology 43.782546 Materials 24.585352 Real Estate 43.065585 Telecommunications Services 12.222578 Utilities 18.719412 Name: P/E Ratio, dtype: float64
plt.figure(figsize= (10,5))
sns.boxplot(df, x= 'GICS Sector', y='P/E Ratio')
plt.xticks(rotation=90);
Energy, IT, Real Estate and Healthcare have the highest P/E ratio on average respectively. Energy has very high variance in P/E ratio relative to other sectors.
df.duplicated().sum()
0
There are duplicate values.
df.isna().sum()
Ticker Symbol 0 Security 0 GICS Sector 0 GICS Sub Industry 0 Current Price 0 Price Change 0 Volatility 0 ROE 0 Cash Ratio 0 Net Cash Flow 0 Net Income 0 Earnings Per Share 0 Estimated Shares Outstanding 0 P/E Ratio 0 P/B Ratio 0 dtype: int64
There are no missing values.
#Viewing boxplots for outliers
outliers = df.select_dtypes(include=np.number).columns.tolist()
plt.figure(figsize=(15, 10))
for i, variable in enumerate(outliers):
plt.subplot(3, 5, i + 1)
sns.boxplot(data=df, x=variable)
plt.tight_layout(pad=2)
plt.show()
Outliers are present in the data, but they will not be treated as they might hold valuable insight.
No Feature engineering required.
#dropping first two columns as they are all unique values
df.drop("Ticker Symbol", axis=1, inplace= True)
df.drop("Security", axis=1, inplace= True)
df.head()
| GICS Sector | GICS Sub Industry | Current Price | Price Change | Volatility | ROE | Cash Ratio | Net Cash Flow | Net Income | Earnings Per Share | Estimated Shares Outstanding | P/E Ratio | P/B Ratio | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Industrials | Airlines | 42.349998 | 9.999995 | 1.687151 | 135 | 51 | -604000000 | 7610000000 | 11.39 | 6.681299e+08 | 3.718174 | -8.784219 |
| 1 | Health Care | Pharmaceuticals | 59.240002 | 8.339433 | 2.197887 | 130 | 77 | 51000000 | 5144000000 | 3.15 | 1.633016e+09 | 18.806350 | -8.750068 |
| 2 | Health Care | Health Care Equipment | 44.910000 | 11.301121 | 1.273646 | 21 | 67 | 938000000 | 4423000000 | 2.94 | 1.504422e+09 | 15.275510 | -0.394171 |
| 3 | Information Technology | Application Software | 93.940002 | 13.977195 | 1.357679 | 9 | 180 | -240840000 | 629551000 | 1.26 | 4.996437e+08 | 74.555557 | 4.199651 |
| 4 | Information Technology | Semiconductors | 55.320000 | -1.827858 | 1.701169 | 14 | 272 | 315120000 | 696878000 | 0.31 | 2.247994e+09 | 178.451613 | 1.059810 |
#Creating a list of numerical columns
num_cols = df.select_dtypes(include=np.number).columns.tolist()
print(num_cols)
['Current Price', 'Price Change', 'Volatility', 'ROE', 'Cash Ratio', 'Net Cash Flow', 'Net Income', 'Earnings Per Share', 'Estimated Shares Outstanding', 'P/E Ratio', 'P/B Ratio']
# Scaling the data set before clustering with standard scaler on numerical columns
scaler = StandardScaler()
subset = df[num_cols].copy()
subset_scaled = scaler.fit_transform(subset)
#Scaled data set
scaled_df = pd.DataFrame(subset_scaled, columns=subset.columns)
#First 5 rows of scaled data
scaled_df.head()
| Current Price | Price Change | Volatility | ROE | Cash Ratio | Net Cash Flow | Net Income | Earnings Per Share | Estimated Shares Outstanding | P/E Ratio | P/B Ratio | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -0.393341 | 0.493950 | 0.272749 | 0.989601 | -0.210698 | -0.339355 | 1.554415 | 1.309399 | 0.107863 | -0.652487 | -0.506653 |
| 1 | -0.220837 | 0.355439 | 1.137045 | 0.937737 | 0.077269 | -0.002335 | 0.927628 | 0.056755 | 1.250274 | -0.311769 | -0.504205 |
| 2 | -0.367195 | 0.602479 | -0.427007 | -0.192905 | -0.033488 | 0.454058 | 0.744371 | 0.024831 | 1.098021 | -0.391502 | 0.094941 |
| 3 | 0.133567 | 0.825696 | -0.284802 | -0.317379 | 1.218059 | -0.152497 | -0.219816 | -0.230563 | -0.091622 | 0.947148 | 0.424333 |
| 4 | -0.260874 | -0.492636 | 0.296470 | -0.265515 | 2.237018 | 0.133564 | -0.202703 | -0.374982 | 1.978399 | 3.293307 | 0.199196 |
We have scaled the data to reduce bias and influence on columns with larger values. We can see in the first five rows that the data has been scaled.
#Creating another copy to append labels later
#Doing this so labels won't influence clustering from k-means to hc
df1=df.copy()
scaled_df1=scaled_df.copy()
Ticker Symbol and Security(Company) have been dropped due to all values being unique.
#Plotting catergorical columns together at once
cat_cols = ["GICS Sector", "GICS Sub Industry"]
for feature in df[cat_cols]:
labeled_barplot(df, feature)
GICS Sector and sub industry distribution remains the same.
#Using function and loop to plot all columns at once with scaled data
for feature in df[num_cols]:
histogram_boxplot(scaled_df, feature)
Scaled data has similar distributions, but the values are changed to mitigate bias from larger numbers within the data.
%%time
clusters = range(1, 16) #Picking range from 1-15 to test clustering
meanDistortions = []
for k in clusters:
model = KMeans(n_clusters=k, random_state=0)
model.fit(scaled_df)
prediction = model.predict(scaled_df)
distortion = (
sum(
np.min(cdist(scaled_df, model.cluster_centers_, "euclidean"), axis=1)
)
/ scaled_df.shape[0]
)
meanDistortions.append(distortion)
print("Number of Clusters:", k, "\tAverage Distortion:", distortion)
plt.plot(clusters, meanDistortions, "bx-")
plt.xlabel("Number of Cluster")
plt.ylabel("Average Distortion")
plt.title("Elbow Plot")
Number of Clusters: 1 Average Distortion: 2.5425069919221697 Number of Clusters: 2 Average Distortion: 2.382318498894466 Number of Clusters: 3 Average Distortion: 2.2683105560042285 Number of Clusters: 4 Average Distortion: 2.1745559827866363 Number of Clusters: 5 Average Distortion: 2.1147830379797616 Number of Clusters: 6 Average Distortion: 2.0872686595048133 Number of Clusters: 7 Average Distortion: 2.008680132690643 Number of Clusters: 8 Average Distortion: 1.9711152823639846 Number of Clusters: 9 Average Distortion: 1.8905345519244967 Number of Clusters: 10 Average Distortion: 1.8568527333547074 Number of Clusters: 11 Average Distortion: 1.8574273605424652 Number of Clusters: 12 Average Distortion: 1.786263657222773 Number of Clusters: 13 Average Distortion: 1.7118438968536325 Number of Clusters: 14 Average Distortion: 1.6825666740823932 Number of Clusters: 15 Average Distortion: 1.6665030856205127 CPU times: user 2.05 s, sys: 27.7 ms, total: 2.08 s Wall time: 1.55 s
Text(0.5, 1.0, 'Elbow Plot')
From the elbow plot, there is no distinct point to select. Ther is a slight bend at 5, 7, 9 and 10.
%%time
sil_score = []
cluster_list = list(range(2, 15))
for n_clusters in cluster_list:
clusterer = KMeans(n_clusters=n_clusters, random_state=0)
preds = clusterer.fit_predict((scaled_df))
score = silhouette_score(scaled_df, preds)
sil_score.append(score)
print("For n_clusters = {}, silhouette score is {}".format(n_clusters, score))
plt.plot(cluster_list, sil_score)
plt.xlabel("Number of Clusters")
plt.ylabel("Silhouette Score")
For n_clusters = 2, silhouette score is 0.43969639509980457 For n_clusters = 3, silhouette score is 0.45797710447228496 For n_clusters = 4, silhouette score is 0.4577225970476733 For n_clusters = 5, silhouette score is 0.35515084792732604 For n_clusters = 6, silhouette score is 0.4315903528127779 For n_clusters = 7, silhouette score is 0.4025633625337274 For n_clusters = 8, silhouette score is 0.40485971473985305 For n_clusters = 9, silhouette score is 0.10450448075395784 For n_clusters = 10, silhouette score is 0.12002136446835195 For n_clusters = 11, silhouette score is 0.2178504146887798 For n_clusters = 12, silhouette score is 0.13060376012568126 For n_clusters = 13, silhouette score is 0.1757117204389242 For n_clusters = 14, silhouette score is 0.18300878905519907 CPU times: user 2.4 s, sys: 1 s, total: 3.4 s Wall time: 1.8 s
Text(0, 0.5, 'Silhouette Score')
From silhouette scores clusters of 3, 4 and 6 give best performance, with a drastic drop off at k=9. The silhouette score is decreasing as n increases and only gives a max score around 0.46.
#Visualizing k with silhouette coefficients
visualizer = SilhouetteVisualizer(KMeans(10, random_state=1))
visualizer.fit(scaled_df)
visualizer.show();
With 10 clusters the large group has been split up. There are some negative scores present.
#Visualizing k with silhouette coefficients
visualizer = SilhouetteVisualizer(KMeans(9, random_state=1))
visualizer.fit(scaled_df)
visualizer.show();
Cluster 2 holds the majority of entries with other clusters being smaller in comparison. There are some negative scores present.
#Visualizing k with silhouette coefficients
visualizer = SilhouetteVisualizer(KMeans(8, random_state=1))
visualizer.fit(scaled_df)
visualizer.show();
One cluster is dominating the grouping and other clusters are small with some negative scores.
#Visualizing k with silhouette coefficients
visualizer = SilhouetteVisualizer(KMeans(7, random_state=1))
visualizer.fit(scaled_df)
visualizer.show();
Cluster 0 holds the majority of entries with other clusters being small in comparison. There are some negative scores present.
#Visualizing k with silhouette coefficients
visualizer = SilhouetteVisualizer(KMeans(6, random_state=1))
visualizer.fit(scaled_df)
visualizer.show();
Cluster 0 contains the majority of entries with other clusters being small in comparison. There are negative scores present as well.
#Visualizing k with silhouette coefficients
visualizer = SilhouetteVisualizer(KMeans(5, random_state=1))
visualizer.fit(scaled_df)
visualizer.show();
Cluster group1 holds the majority of the entries with other clusters being much smaller with negative scores present.
#Visualizing k with silhouette coefficients
visualizer = SilhouetteVisualizer(KMeans(4, random_state=1))
visualizer.fit(scaled_df)
visualizer.show();
Cluster 0 holds the majority of the entries with other clusters being smaller in comparison. Cluster 3 is showing large amounts of negative scoring.
The outliers in the data may be leading to low silhouette coefficient values and negative values. The best score seems to be about 0.4. K=7 seems to give the best balance between coefficient values and also gave a slight indication on the elbow plot.
# let's take 7 as number of clusters
kmeans = KMeans(n_clusters=7, random_state=0)
kmeans.fit(scaled_df)
KMeans(n_clusters=7, random_state=0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
KMeans(n_clusters=7, random_state=0)
# adding kmeans cluster labels to the original and scaled dataframes
df1["K_means_segments"] = kmeans.labels_
scaled_df1["K_means_segments"] = kmeans.labels_
cluster_profile = df1.groupby("K_means_segments").mean()
cluster_profile["count_in_each_segments"] = (
df1.groupby("K_means_segments")["Current Price"].count().values
)
cluster_profile.style.highlight_max(color="lightblue", axis=0)
| Current Price | Price Change | Volatility | ROE | Cash Ratio | Net Cash Flow | Net Income | Earnings Per Share | Estimated Shares Outstanding | P/E Ratio | P/B Ratio | count_in_each_segments | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| K_means_segments | ||||||||||||
| 0 | 48.103077 | 6.053507 | 1.163964 | 27.538462 | 77.230769 | 773230769.230769 | 14114923076.923077 | 3.958462 | 3918734987.169230 | 16.098039 | -4.253404 | 13 |
| 1 | 73.281231 | 5.002456 | 1.373721 | 25.303030 | 51.018939 | 5792560.606061 | 1517540458.333333 | 3.773201 | 422805643.026553 | 23.232765 | -3.313539 | 264 |
| 2 | 26.990000 | -14.060688 | 3.296307 | 603.000000 | 57.333333 | -585000000.000000 | -17555666666.666668 | -39.726667 | 481910081.666667 | 71.528835 | 1.638633 | 3 |
| 3 | 108.304002 | 10.737770 | 1.165694 | 566.200000 | 26.600000 | -278760000.000000 | 687180000.000000 | 1.548000 | 349607057.720000 | 34.898915 | -16.851358 | 5 |
| 4 | 632.714991 | 7.374164 | 1.541343 | 19.333333 | 158.333333 | -24046333.333333 | 907393166.666667 | 16.270000 | 125797901.323333 | 123.049240 | 35.355736 | 6 |
| 5 | 95.281515 | 14.717580 | 1.814754 | 25.954545 | 308.909091 | 645568272.727273 | 871490181.818182 | 2.006364 | 730848546.662727 | 57.950455 | 7.992920 | 22 |
| 6 | 37.282919 | -14.529500 | 2.820301 | 40.666667 | 47.555556 | -133624777.777778 | -1904442925.925926 | -4.957037 | 503635899.112593 | 86.787432 | 1.378738 | 27 |
cluster_profile.style.highlight_min(color="orange", axis=0)
| Current Price | Price Change | Volatility | ROE | Cash Ratio | Net Cash Flow | Net Income | Earnings Per Share | Estimated Shares Outstanding | P/E Ratio | P/B Ratio | count_in_each_segments | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| K_means_segments | ||||||||||||
| 0 | 48.103077 | 6.053507 | 1.163964 | 27.538462 | 77.230769 | 773230769.230769 | 14114923076.923077 | 3.958462 | 3918734987.169230 | 16.098039 | -4.253404 | 13 |
| 1 | 73.281231 | 5.002456 | 1.373721 | 25.303030 | 51.018939 | 5792560.606061 | 1517540458.333333 | 3.773201 | 422805643.026553 | 23.232765 | -3.313539 | 264 |
| 2 | 26.990000 | -14.060688 | 3.296307 | 603.000000 | 57.333333 | -585000000.000000 | -17555666666.666668 | -39.726667 | 481910081.666667 | 71.528835 | 1.638633 | 3 |
| 3 | 108.304002 | 10.737770 | 1.165694 | 566.200000 | 26.600000 | -278760000.000000 | 687180000.000000 | 1.548000 | 349607057.720000 | 34.898915 | -16.851358 | 5 |
| 4 | 632.714991 | 7.374164 | 1.541343 | 19.333333 | 158.333333 | -24046333.333333 | 907393166.666667 | 16.270000 | 125797901.323333 | 123.049240 | 35.355736 | 6 |
| 5 | 95.281515 | 14.717580 | 1.814754 | 25.954545 | 308.909091 | 645568272.727273 | 871490181.818182 | 2.006364 | 730848546.662727 | 57.950455 | 7.992920 | 22 |
| 6 | 37.282919 | -14.529500 | 2.820301 | 40.666667 | 47.555556 | -133624777.777778 | -1904442925.925926 | -4.957037 | 503635899.112593 | 86.787432 | 1.378738 | 27 |
plt.figure(figsize=(20, 35))
plt.suptitle("Boxplot of scaled numerical variables for each cluster", fontsize=20)
for i, variable in enumerate(num_cols):
plt.subplot(5, 3, i + 1)
sns.boxplot(data=scaled_df1, x="K_means_segments", y=variable)
plt.tight_layout(pad=2.0)
plt.figure(figsize=(20, 35))
plt.suptitle("Boxplot of original numerical variables for each cluster", fontsize=20)
for i, variable in enumerate(num_cols):
plt.subplot(5, 3, i + 1)
sns.boxplot(data=df1, x="K_means_segments", y=variable)
plt.tight_layout(pad=2.0)
Cluster 0:
Cluster 1:
Cluster 2:
Cluster 3:
Cluster 4:
Cluster 5:
Cluster 6:
# list of distance metrics
distance_metrics = ["euclidean", "chebyshev", "mahalanobis", "cityblock"]
# list of linkage methods
linkage_methods = ["single", "complete", "average", "weighted"]
high_cophenet_corr = 0
high_dm_lm = [0, 0]
for dm in distance_metrics:
for lm in linkage_methods:
Z = linkage(scaled_df, metric=dm, method=lm)
c, coph_dists = cophenet(Z, pdist(scaled_df))
print(
"Cophenetic correlation for {} distance and {} linkage is {}.".format(
dm.capitalize(), lm, c
)
)
if high_cophenet_corr < c:
high_cophenet_corr = c
high_dm_lm[0] = dm
high_dm_lm[1] = lm
Cophenetic correlation for Euclidean distance and single linkage is 0.9232271494002922. Cophenetic correlation for Euclidean distance and complete linkage is 0.7873280186580672. Cophenetic correlation for Euclidean distance and average linkage is 0.9422540609560814. Cophenetic correlation for Euclidean distance and weighted linkage is 0.8693784298129404. Cophenetic correlation for Chebyshev distance and single linkage is 0.9062538164750717. Cophenetic correlation for Chebyshev distance and complete linkage is 0.598891419111242. Cophenetic correlation for Chebyshev distance and average linkage is 0.9338265528030499. Cophenetic correlation for Chebyshev distance and weighted linkage is 0.9127355892367. Cophenetic correlation for Mahalanobis distance and single linkage is 0.925919553052459. Cophenetic correlation for Mahalanobis distance and complete linkage is 0.7925307202850002. Cophenetic correlation for Mahalanobis distance and average linkage is 0.9247324030159736. Cophenetic correlation for Mahalanobis distance and weighted linkage is 0.8708317490180428. Cophenetic correlation for Cityblock distance and single linkage is 0.9334186366528574. Cophenetic correlation for Cityblock distance and complete linkage is 0.7375328863205818. Cophenetic correlation for Cityblock distance and average linkage is 0.9302145048594667. Cophenetic correlation for Cityblock distance and weighted linkage is 0.731045513520281.
#Highest cophenetic correlation
print(
"Highest cophenetic correlation is {}, with {} distance and {} linkage.".format(
high_cophenet_corr, high_dm_lm[0].capitalize(), high_dm_lm[1]
)
)
Highest cophenetic correlation is 0.9422540609560814, with Euclidean distance and average linkage.
The highest value for cophenetic correlation is obtained with Euclidean distance and average linkage. We will now explore Euclidean in more detail with centroid and ward linkage methods.
# list of linkage methods
linkage_methods = ["single", "complete", "average", "centroid", "ward", "weighted"]
high_cophenet_corr = 0
high_dm_lm = [0, 0]
for lm in linkage_methods:
Z = linkage(scaled_df, metric="euclidean", method=lm)
c, coph_dists = cophenet(Z, pdist(scaled_df))
print("Cophenetic correlation for {} linkage is {}.".format(lm, c))
if high_cophenet_corr < c:
high_cophenet_corr = c
high_dm_lm[0] = "euclidean"
high_dm_lm[1] = lm
Cophenetic correlation for single linkage is 0.9232271494002922. Cophenetic correlation for complete linkage is 0.7873280186580672. Cophenetic correlation for average linkage is 0.9422540609560814. Cophenetic correlation for centroid linkage is 0.9314012446828154. Cophenetic correlation for ward linkage is 0.7101180299865353. Cophenetic correlation for weighted linkage is 0.8693784298129404.
#Highest cophenetic correlation
print(
"Highest cophenetic correlation is {}, with {} linkage.".format(
high_cophenet_corr, high_dm_lm[1]
)
)
Highest cophenetic correlation is 0.9422540609560814, with average linkage.
Again average linkage gave the highest value for cophenetic correlation. We will now explore the dendograms to better visualize the clustering and grouping.
# Linkage methods for euclidean distance
linkage_methods = ["single", "complete", "average", "centroid", "ward", "weighted"]
# lists to save results of cophenetic correlation calculation
compare_cols = ["Linkage", "Cophenetic Coefficient"]
fig, axs = plt.subplots(len(linkage_methods), 1, figsize=(15, 30))
for i, method in enumerate(linkage_methods): #Plot each linkage method
Z = linkage(scaled_df, metric="euclidean", method=method)
dendrogram(Z, ax=axs[i])
axs[i].set_title(f"Dendrogram ({method.capitalize()} Linkage)")
coph_corr, coph_dist = cophenet(Z, pdist(scaled_df))
axs[i].annotate(
f"Cophenetic\nCorrelation\n{coph_corr:0.2f}",
(0.80, 0.80),
xycoords="axes fraction",
)
Average linkage has the highest value for cophenetic correlation (0.94) and gives decent grouping in the dendogram. Ward linkage gives more clear separation, but has a lower score of 0.71. We will proceed with ward linkage, as it gives more distinction between the clusters.
HCmodel = AgglomerativeClustering(n_clusters=4, affinity="euclidean", linkage= "ward")
HCmodel.fit(scaled_df)
AgglomerativeClustering(affinity='euclidean', n_clusters=4)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
AgglomerativeClustering(affinity='euclidean', n_clusters=4)
scaled_df1["HC_Clusters"] = HCmodel.labels_
df1["HC_Clusters"] = HCmodel.labels_
data1["HC_Clusters"] = HCmodel.labels_
cluster_profile_hc = df1.groupby("HC_Clusters").mean()
cluster_profile_hc["count_in_each_segments"] = (
df1.groupby("HC_Clusters")["Current Price"].count().values
)
# lets display cluster profile
cluster_profile_hc.style.highlight_max(color="lightblue", axis=0)
| Current Price | Price Change | Volatility | ROE | Cash Ratio | Net Cash Flow | Net Income | Earnings Per Share | Estimated Shares Outstanding | P/E Ratio | P/B Ratio | K_means_segments | count_in_each_segments | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HC_Clusters | |||||||||||||
| 0 | 48.006208 | -11.263107 | 2.590247 | 196.551724 | 40.275862 | -495901724.137931 | -3597244655.172414 | -8.689655 | 486319827.294483 | 75.110924 | -2.162622 | 5.068966 | 29 |
| 1 | 326.198218 | 10.563242 | 1.642560 | 14.400000 | 309.466667 | 288850666.666667 | 864498533.333333 | 7.785333 | 544900261.301333 | 113.095334 | 19.142151 | 4.666667 | 15 |
| 2 | 42.848182 | 6.270446 | 1.123547 | 22.727273 | 71.454545 | 558636363.636364 | 14631272727.272728 | 3.410000 | 4242572567.290909 | 15.242169 | -4.924615 | 0.000000 | 11 |
| 3 | 72.760400 | 5.213307 | 1.427078 | 25.603509 | 60.392982 | 79951512.280702 | 1538594322.807018 | 3.655351 | 446472132.228456 | 24.722670 | -2.647194 | 1.277193 | 285 |
# lets display cluster profile
cluster_profile_hc.style.highlight_min(color="orange", axis=0)
| Current Price | Price Change | Volatility | ROE | Cash Ratio | Net Cash Flow | Net Income | Earnings Per Share | Estimated Shares Outstanding | P/E Ratio | P/B Ratio | K_means_segments | count_in_each_segments | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HC_Clusters | |||||||||||||
| 0 | 48.006208 | -11.263107 | 2.590247 | 196.551724 | 40.275862 | -495901724.137931 | -3597244655.172414 | -8.689655 | 486319827.294483 | 75.110924 | -2.162622 | 5.068966 | 29 |
| 1 | 326.198218 | 10.563242 | 1.642560 | 14.400000 | 309.466667 | 288850666.666667 | 864498533.333333 | 7.785333 | 544900261.301333 | 113.095334 | 19.142151 | 4.666667 | 15 |
| 2 | 42.848182 | 6.270446 | 1.123547 | 22.727273 | 71.454545 | 558636363.636364 | 14631272727.272728 | 3.410000 | 4242572567.290909 | 15.242169 | -4.924615 | 0.000000 | 11 |
| 3 | 72.760400 | 5.213307 | 1.427078 | 25.603509 | 60.392982 | 79951512.280702 | 1538594322.807018 | 3.655351 | 446472132.228456 | 24.722670 | -2.647194 | 1.277193 | 285 |
Checking sectors and sub industries in clusters
cluster_profile_hc_with_cat = data1.groupby("HC_Clusters").mean()
cluster_profile_hc_with_cat["count_in_each_segments"] = (
data1.groupby("HC_Clusters")["Current Price"].count().values
)
for sector in data1["HC_Clusters"].unique():
print("In cluster {}, the following sectors are present:".format(sector))
print(data1[data1["HC_Clusters"] == sector]["GICS Sector"].unique())
print()
In cluster 3, the following sectors are present: ['Industrials' 'Health Care' 'Information Technology' 'Consumer Staples' 'Utilities' 'Financials' 'Real Estate' 'Materials' 'Consumer Discretionary' 'Telecommunications Services' 'Energy'] In cluster 1, the following sectors are present: ['Information Technology' 'Health Care' 'Consumer Discretionary' 'Real Estate' 'Telecommunications Services' 'Consumer Staples'] In cluster 0, the following sectors are present: ['Industrials' 'Energy' 'Consumer Discretionary' 'Consumer Staples' 'Materials' 'Financials' 'Information Technology'] In cluster 2, the following sectors are present: ['Financials' 'Consumer Discretionary' 'Information Technology' 'Consumer Staples' 'Health Care' 'Telecommunications Services' 'Energy']
Health care, telecommunication services appears in all clusters besides 0.
Industrials, financials appears in cluster 3 and 0.
IT, Consumer staples, consumer discretionay appears in all clusters.
Utilities, materials appears only in cluster 3.
Real estate appears in cluster 3 and 1.
Energy appears in all clusters besides 1.
for sub_sector in data1["HC_Clusters"].unique():
print("In cluster {}, the following sub sectors are present:".format(sub_sector))
print(data1[data1["HC_Clusters"] == sub_sector]["GICS Sub Industry"].unique())
print()
In cluster 3, the following sub sectors are present: ['Airlines' 'Pharmaceuticals' 'Health Care Equipment' 'Application Software' 'Semiconductors' 'Agricultural Products' 'MultiUtilities' 'Electric Utilities' 'Life & Health Insurance' 'Property & Casualty Insurance' 'REITs' 'Multi-line Insurance' 'Insurance Brokers' 'Internet Software & Services' 'Specialty Chemicals' 'Semiconductor Equipment' 'Electrical Components & Equipment' 'Asset Management & Custody Banks' 'Specialized REITs' 'Specialty Stores' 'Managed Health Care' 'Electronic Components' 'Aerospace & Defense' 'Home Entertainment Software' 'Residential REITs' 'Water Utilities' 'Consumer Finance' 'Banks' 'Biotechnology' 'Metal & Glass Containers' 'Health Care Distributors' 'Auto Parts & Equipment' 'Construction & Farm Machinery & Heavy Trucks' 'Real Estate Services' 'Hotels, Resorts & Cruise Lines' 'Fertilizers & Agricultural Chemicals' 'Regional Banks' 'Household Products' 'Air Freight & Logistics' 'Financial Exchanges & Data' 'Industrial Machinery' 'Health Care Supplies' 'Railroads' 'Integrated Telecommunications Services' 'IT Consulting & Other Services' 'Drug Retail' 'Integrated Oil & Gas' 'Diversified Chemicals' 'Health Care Facilities' 'Industrial Conglomerates' 'Broadcasting & Cable TV' 'Cable & Satellite' 'Research & Consulting Services' 'Soft Drinks' 'Oil & Gas Exploration & Production' 'Investment Banking & Brokerage' 'Internet & Direct Marketing Retail' 'Building Products' 'Electronic Equipment & Instruments' 'Diversified Commercial Services' 'Retail REITs' 'Automobile Manufacturers' 'Consumer Electronics' 'Tires & Rubber' 'Industrial Materials' 'Oil & Gas Equipment & Services' 'Leisure Products' 'Motorcycle Manufacturers' 'Technology Hardware, Storage & Peripherals' 'Computer Hardware' 'Packaged Foods & Meats' 'Paper Packaging' 'Advertising' 'Trucking' 'Networking Equipment' 'Homebuilding' 'Distributors' 'Multi-Sector Holdings' 'Alternative Carriers' 'Restaurants' 'Diversified Financial Services' 'Home Furnishings' 'Construction Materials' 'Tobacco' 'Oil & Gas Refining & Marketing & Transportation' 'Life Sciences Tools & Services' 'Gold' 'Steel' 'Housewares & Specialties' 'Thrifts & Mortgage Finance' 'Technology, Hardware, Software and Supplies' 'Personal Products' 'Industrial Gases' 'Data Processing & Outsourced Services' 'Human Resource & Employment Services' 'Office REITs' 'Brewers' 'Publishing' 'Specialty Retail' 'Apparel, Accessories & Luxury Goods' 'Household Appliances' 'Environmental Services' 'Casinos & Gaming'] In cluster 1, the following sub sectors are present: ['Data Processing & Outsourced Services' 'Biotechnology' 'Internet & Direct Marketing Retail' 'Restaurants' 'REITs' 'Internet Software & Services' 'Integrated Telecommunications Services' 'Health Care Equipment' 'Soft Drinks' 'Health Care Distributors'] In cluster 0, the following sub sectors are present: ['Building Products' 'Oil & Gas Exploration & Production' 'Oil & Gas Equipment & Services' 'Integrated Oil & Gas' 'Cable & Satellite' 'Household Products' 'Copper' 'Oil & Gas Refining & Marketing & Transportation' 'Diversified Financial Services' 'Application Software'] In cluster 2, the following sub sectors are present: ['Banks' 'Automobile Manufacturers' 'Semiconductors' 'Soft Drinks' 'Pharmaceuticals' 'Integrated Telecommunications Services' 'Integrated Oil & Gas']
Looking at boxplots for cluster distribution
plt.figure(figsize=(20, 35))
plt.suptitle("Boxplot of scaled numerical variables for each cluster", fontsize=20)
for i, variable in enumerate(num_cols):
plt.subplot(5, 3, i + 1)
sns.boxplot(data=scaled_df1, x="HC_Clusters", y=variable)
plt.tight_layout(pad=2.0)
plt.figure(figsize=(20, 35))
plt.suptitle("Boxplot of original numerical variables for each cluster", fontsize=20)
for i, variable in enumerate(num_cols):
plt.subplot(5, 3, i + 1)
sns.boxplot(data=df1, x="HC_Clusters", y=variable)
plt.tight_layout(pad=2.0)
Cluster 0:
Cluster 1:
Cluster 2:
Cluster 3:
You compare several things, like:
You can also mention any differences or similarities you obtained in the cluster profiles from both the clustering techniques.
The clustering techniques were similar in execution time, but the K-means clustering did have more subjectivity. The elbow plot and silhouette plots did not give a clear choice for number of clusters. Hierarchical Clustering, on the other hand, was able to compare distance metrics and linkage methods and return the best performing model. The dendograms were slighlty easier to choose an appropriate number of clusters. Specifically the ward method gave much more distinct clusters.
For K-means clustering:
There are 7 clusters, with one having 264 observations. There are 3 clusters with only 3, 5 and 6 observations. There are 3 other clusters with 13, 22 and 27 observations. The clusters each are unique by containing the highest or lowest values on average across the fields. The only cluster to not show this feature is Cluster 1 which holds the majority of observations and is represented as average across the fields.
For Hiearchical Clustering:
There are 4 clusters, with one having 285 observations. There are 3 other clusters with 11, 15 and 29 observations. The other clusters can be represented by either max or min values across the fields. Cluster 3 shows similarity in the major cluster from K-means in that the distributions are relatively average when compared to other distributions across the fields.
Cluster 0:
Cluster 1:
Cluster 2:
Cluster 3: